SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
How to recover malware assembly codes
Jean-Yves Marion


Jean-Yves Marion - Laboratoire de Haute Sécurité (LHS)
Duqu : The precursor to the next Stuxnet
Duqu is Targeted attacks
Start in June 2010 ?
Discover in Sept. 2011 by Crysys (Budapest)
See white paper of Symantec
Code injection
Duqu is similar to Stuxnet
➡ Same installation mechanisms and Similar functionalities
➡ But Anti-Virus companies detect it in Sept 2011 !
➡ None of 43 anti-virus of VirusTotal was able to detect Duqu knowing Stuxnet.
0-day exploit
Driver File
Main DLL Service.exe
The decryption routine!
of the payload installer
Unpack a UPX-file
The main DLL code is !
now decrypted !
and depacked !
in memory only
Wave 1
Wave 2
Duqu is a self-modifying program
A common protection scheme for malware
Wave 1
P33C7 18+&01223
Wave 2
Self-modifying program schema
Self-modifying codes
A bare semantics
µ0[c] : binary c loaded into memory µ0
µn : memory n : registers
n(ip) returns the address of the next instruction to run
➡ Traces are obtained by code instrumentation : we use Pin (intel)
We collect an execution trace of P :
For each run instruction, we gather
– its memory address
– its machine instruction
(µ0[c], 0) ! (µ1, 1) ! . . . ! (µn, n) ! . . .
Self-modifying codes
Dynamic typing of memories
(m) = (kr, kw, kx) where m is a memory adress
kw is the writing level
kr is the reading level
kx is the execution level
0(m) = (0, 0, 0)
(µ0[c], 0, 0) ! (µ1, 1, 1) ! . . . ! (µn n, n) ! . . .
The execution level is the level of n(ip) given by n( n(ip))
Self-modifying codes
Dynamic typing of memories
(m) = (kr, kw, kx) where m is a memory adress
kw is the writing level
kx is the execution level
0(m) = (0, 0, 0)
(µ0[c], 0, 0) ! (µ1, 1, 1) ! . . . ! (µn n, n) ! . . .
An instruction written at level k has an execution level of k+1
@a: mov esi,$index
@b: xor [@offset+esi],$key
@c: sub esi,4
@d: jnz @b
@offset: [encrypted data]
Wave 1
Wave 2
kw is 1
The execution level of @offset is 2 because it is written by instructions in wave 1
So kx is 2
Self-modifying codes
kw is the writing level
kx is the execution level
A self-modifying program c is a program such that its execution level is > 1 for an input
i+1(m) =
(kr, k + 1, kx) if m is written
and i(m) = (kr, kw, kx)
i(m) otherwise
i+1(m) =
(kr, k, k + 1) m = (ip) & i( i(ip)) = (kr, k, kx)
i(m) otherwise
Similar to Phase semantics of Preda, Giacobazzi, and Debray
The execution level is k + 1
i(ip) points to an instruction like mov [m],eax
Packer protections
Exemple (4/5)
• hostname pack´e avec Themida
Different code
waves with their
Themida packer
Yoda packer
Fig. 7.2: Résultats de l’analyse
Nom du binaire k k Blind Decrypt Check Scrambled
hostname.exe (original) 1 1
!EPack_1..exe 2 2 X X
acprotect-hostname.exe 18 882 X X X X
aspack-hostname.exe 2 3 X X X X
enigma_protector_1.16.exe 5 24 X X X X
exefog_1.1.exe 3 5 X X X
expressor-hostname.exe 2 3 X X
fsg.exe 2 2 X X X
mew11.exe 2 2 X X X
molebox-hostname.exe 3 5 X X X X
morphine_1.9.exe 3 3 X X X
nakedpack.exe 2 2 X
npack-hostname.exe 2 2 X
nspack.exe 3 4 X X X
packman_1..exe 2 2 X X X
pec2-hostname.exe 3 4 X X X X
pelock-hostname.exe 9 16 X X X X
pepack.exe 1 1 X
pespin-hostname.exe 4 38 X X X X
petite.exe 2 2 X X
rlpack_1.17_full_version.exe 2 2 X X X
rlpack-hostname.exe 2 2 X
telock_.98.exe 2 2 X
themida_1.8.5.2.exe 11 164 X X X X
upx-hostname.exe 2 2 X X
vmprotect-hostname.exe 1 1 X
winupack-hostname.exe 3 4 X X X X
Yodas_Crypter_v1.3.exe 4 4 X X X X
yp-1.02-hostname.exe 4 6 X X X X
Légende :
Where are we ?
(µ0[c], 0, 0) ! (µ1, 1, 1) ! . . . ! (µn, n, n) ! . . .
Dynamic typed memory trace
which defines a sequence of waves
Wave 1
Wave 2 Wave
Can we recover the assembly code of the wave K ?
Can we reconstruct the full CFG ?
The inputs:
An execution trace inside wave K
The snapshot of the memory at wave K
The problem and its inputs
Can we recover the assembly code of this wave ?
The inputs:
An execution trace inside wave K
The snapshot of the memory at wave K
Snapshot of the memory at the beginning of wave 5
Dynamic vs Static analysis
A trace obtained by dynamic analysis
Dynamic typed memory trace
Undiscovered code in white boxes
Why is it difficult to recover a CFG in x86 ?
Indirect jumps
100: jmp eax
– Fuzzing !
– We need to have a robust approximation of x86 semantics!
– Abstract interpretation!
– SMT Solver
What is the set of possible values of eax ?
Junk code insertion
Junk code insertion at the expected return adress
100 : call @a
junk code
@a : …
How to determine the return address of a call ?
125 : pop esi
Modify the return
address (125)
See Debray’s paper
Yet another difficulty
01006 e7a f e 04 0b inc byte [ ebx+ecx ]
01006 e7d eb f f jmp +1
01006 e7e f f c9 dec ecx
01006 e80 7 f e6 jg 01006 e68
01006 e82 8b c1 mov eax , ecx
Figure 1. Overlapping assembly in tELock.
010059 f0 89 f9 mov ecx , edi
,=< 010059 f2 79 07 jns +9
| 010059 f4 0 f b7 07 movzx eax , word [ edi ]
| 010059 f7 47 inc edi
| 010059 f8 50 push eax
| 010059 f9 47 inc edi
| 010059 fa b9 57 48 f2 ae mov ecx , aef24857
‘ > 010059 fb 57 push edi
010059 f c 48 dec eax
010059 fd f2 ae repne scasb
010059 f f 55 push ebp
Figure 2. Overlapping assembly in UPX.
2.2.1 tELock0.99
tELock0.99 uses an overlapping technique to simply obfuscate the code as follows. Figure 1 sh
disassembly taken from the address 01006e7a. There is a jmp +1 instruction at address 01006e7
Figure 2. Overlapping assembly in UPX.
2.2.1 tELock0.99
tELock0.99 uses an overlapping technique to simply obfuscate the code as follows. Figure 1 shows a recursive
disassembly taken from the address 01006e7a. There is a jmp +1 instruction at address 01006e7d and coded on
the two bytes eb ff, that jumps to the address 01006e7d + 1, which is a dec ecx instruction (ff c9) which shares the
byte ff at address 01006e7d + 1 with the jmp instruction.
2.2.2 UPX
UPX uses overlapping to optimize the size of the final packed binary (figure 2). The unpacker part uses a conditional
jump to separate the control flow into two overlapping blocks which both realign after a few instructions.
(TODO: expliquer les deux branches, rapidement en quoi elles sont utiles)
2.2.3 Overlapping in state-of-the-art disassemblers
Existing disassemblers, even when doing recursive traversal, assume that code cannot overlap and fail at displaying
the resulting disassembly.
With IDA Pro (v6.3), the tELock example looks as follows:
01006E7A inc byte ptr [ ebx+ecx ]
01006E7D jmp short near ptr loc_1006E7D+1
01006E7D ;
01006E7F db 0C9h ;
01006E80 db 7Fh ;
01006E81 db 0E6h ;
01006E82 db 8Bh ;
01006E83 db 0C1h ;
With Radare (TODO: recursive?), the tELock example is disassembled as follows:
01006 e7a fe040b inc byte [ ebx+ecx ]
01006 e7d e b f f jmp 6 e7e
01006 e7f c9 leave
01006 e80 7 fe6 jg 6e68
01006 e82 8bc1 mov eax , ecx
Both are not able to follow the jmp: the target of the jmp is already disassembled in another assembly instruction
and is thus deemed invalid.
IDA fails
because of jmp +1
BB [0x4 -> 0x5] (0x2)
0x4 dec ecx
BB [0x3 -> 0x4] (0x2)
0x3 jmp 0x4
BB [0x6 -> 0x7] (0x2)
0x6 jg 0x ee
BB [0x0 -> 0x2] (0x3)
0x0 inc byte [ebx+ecx]
BB [0x8 -> 0x9] (0x2)
0x8 mov eax, ecx
Figure 4. Control flow graph for the tELock sample
Another example of mis-alignment
01006 e7d eb f f jmp +1
01006 e7e f f c9 dec ecx
01006 e80 7 f e6 jg 01006 e68
01006 e82 8b c1 mov eax , ecx
Figure 1. Overlapping assembly in tELock.
010059 f0 89 f9 mov ecx , edi
,=< 010059 f2 79 07 jns +9
| 010059 f4 0 f b7 07 movzx eax , word [ edi ]
| 010059 f7 47 inc edi
| 010059 f8 50 push eax
| 010059 f9 47 inc edi
| 010059 fa b9 57 48 f2 ae mov ecx , aef24857
‘ > 010059 fb 57 push edi
010059 f c 48 dec eax
010059 fd f2 ae repne scasb
010059 f f 55 push ebp
Figure 2. Overlapping assembly in UPX.
2.2.1 tELock0.99
tELock0.99 uses an overlapping technique to simply obfuscate the code as follows. Figure 1 shows a recurs
disassembly taken from the address 01006e7a. There is a jmp +1 instruction at address 01006e7d and coded
the two bytes eb ff, that jumps to the address 01006e7d + 1, which is a dec ecx instruction (ff c9) which shares
byte ff at address 01006e7d + 1 with the jmp instruction.
2.2.2 UPX
UPX uses overlapping to optimize the size of the final packed binary (figure 2). The unpacker part uses a conditio
jump to separate the control flow into two overlapping blocks which both realign after a few instructions.
(TODO: expliquer les deux branches, rapidement en quoi elles sont utiles)
bytes in common !!
mov ecx,edi
jnz +9
movzx eax, [edi]
inc edi
push eax
inc edi
mov ecx, aef24857
push ebp
push edu
dec eax
repine scasb
Share 4 bytes
Let’s recap the problem
W2 W4W1 W3 W5
Snapshot of the memory at the beginning of wave 5
Goal : Reconstruct the full CFG
Problem inputs
Snapshot of the memory at the beginning of wave 5
An execution trace
A path in the woods
Junk codes insertion after a call
100 : call @a
junk code
125 : pop esi
@a : pop ebp
Modify the return
@b: ret
100:call @a, @a:pop esi,…, @b:ret;125:pop esi;…
A trace will provide automatically the address 125
It is junk codes only if it is not reachable
See the paper of Krugel and al, Usenix 2004 for another approach
Method for mis alignment
… 89 F9 79 07 0F B7 07 47 50 47 B9 57 48 F2 AE 55 …
mov ecx,edi
jnz +9
push edi
dec eax
repne scasb
push ebp
movzx eax, [edi]
inc edi
push eax
inc edi
mov ecx, aef24857
push ebp
An obfuscation similar to UPX
The CFG construction follow the trace
Then, we search for missing codes
3/ We split blocks
Method for mis alignment
… 89 F9 79 07 0F B7 07 47 50 47 B9 57 48 F2 AE 55 …
mov ecx,edi
jnz +9
push edi
dec eax
repne scasb
movzx eax, [edi]
inc edi
push eax
inc edi
mov ecx, aef24857
push ebp
An obfuscation similar to UPX
1/ The CFF construction follows the trace
2/ Then, we search for missing codes
3/ We split blocks
Misalignment can come from indirect jump … traces are then useful !
The overall method (work in progress)
• A partial CFG is an un-complete CFG 

• Two partial CFG are in conflict if there are two mis-aligned instructions.

• Traces define a set of partial CFG which are in conflict.

mov ecx,edi
jnz +9
movzx eax, [edi]
inc edi
push eax
inc edi
mov ecx, aef24857
push edi
dec eax
repne scasb
push ebp
push ebp
push ebp
Share the !
same adresses
• Edges between partial CFG
indicate mis-alignement

• Then we can synchronize
partial CFG

• There are orphan partial CFG

• There are ok if there is an
edge to a valid address

• Statistic recognition is useful
at this stage
Conclusion and Questions
• We develop a disassembler for self-modified codes :

BinVizz : Visualization of each code wave from a trace

Mais conteúdo relacionado

Mais procurados

Ownership System in Rust
Ownership System in RustOwnership System in Rust
Ownership System in RustChih-Hsuan Kuo
Use C++ to Manipulate mozSettings in Gecko
Use C++ to Manipulate mozSettings in GeckoUse C++ to Manipulate mozSettings in Gecko
Use C++ to Manipulate mozSettings in GeckoChih-Hsuan Kuo
Rust Mozlando Tutorial
Rust Mozlando TutorialRust Mozlando Tutorial
Rust Mozlando Tutorialnikomatsakis
Encrypt all transports
Encrypt all transportsEncrypt all transports
Encrypt all transportsEleanor McHugh
zkStudy Club: Subquadratic SNARGs in the Random Oracle Model
zkStudy Club: Subquadratic SNARGs in the Random Oracle ModelzkStudy Club: Subquadratic SNARGs in the Random Oracle Model
zkStudy Club: Subquadratic SNARGs in the Random Oracle ModelAlex Pruden
Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Mila...
Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Mila...Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Mila...
Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Mila...Codemotion
The Ruby Guide to *nix Plumbing: on the quest for efficiency with Ruby [M|K]RI
The Ruby Guide to *nix Plumbing: on the quest for efficiency with Ruby [M|K]RIThe Ruby Guide to *nix Plumbing: on the quest for efficiency with Ruby [M|K]RI
The Ruby Guide to *nix Plumbing: on the quest for efficiency with Ruby [M|K]RIEleanor McHugh
The Ring programming language version 1.5.1 book - Part 24 of 180
The Ring programming language version 1.5.1 book - Part 24 of 180The Ring programming language version 1.5.1 book - Part 24 of 180
The Ring programming language version 1.5.1 book - Part 24 of 180Mahmoud Samir Fayed
Kernel-Level Programming: Entering Ring Naught
Kernel-Level Programming: Entering Ring NaughtKernel-Level Programming: Entering Ring Naught
Kernel-Level Programming: Entering Ring NaughtDavid Evans
Teaching linked lists data structures using MIDI
Teaching linked lists data structures using MIDITeaching linked lists data structures using MIDI
Teaching linked lists data structures using MIDIMark Guzdial
Protocol handler in Gecko
Protocol handler in GeckoProtocol handler in Gecko
Protocol handler in GeckoChih-Hsuan Kuo
Oops pramming with examples
Oops pramming with examplesOops pramming with examples
Oops pramming with examplesSyed Khaleel
Lecture no 3
Lecture no 3Lecture no 3
Lecture no 3hasi071
Erlang bootstrap course
Erlang bootstrap courseErlang bootstrap course
Erlang bootstrap courseMartin Logan

Mais procurados (20)

Ownership System in Rust
Ownership System in RustOwnership System in Rust
Ownership System in Rust
Whispered secrets
Whispered secretsWhispered secrets
Whispered secrets
Use C++ to Manipulate mozSettings in Gecko
Use C++ to Manipulate mozSettings in GeckoUse C++ to Manipulate mozSettings in Gecko
Use C++ to Manipulate mozSettings in Gecko
Go a crash course
Go   a crash courseGo   a crash course
Go a crash course
Rust Mozlando Tutorial
Rust Mozlando TutorialRust Mozlando Tutorial
Rust Mozlando Tutorial
Encrypt all transports
Encrypt all transportsEncrypt all transports
Encrypt all transports
zkStudy Club: Subquadratic SNARGs in the Random Oracle Model
zkStudy Club: Subquadratic SNARGs in the Random Oracle ModelzkStudy Club: Subquadratic SNARGs in the Random Oracle Model
zkStudy Club: Subquadratic SNARGs in the Random Oracle Model
Whispered secrets
Whispered secretsWhispered secrets
Whispered secrets
Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Mila...
Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Mila...Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Mila...
Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Mila...
The Ruby Guide to *nix Plumbing: on the quest for efficiency with Ruby [M|K]RI
The Ruby Guide to *nix Plumbing: on the quest for efficiency with Ruby [M|K]RIThe Ruby Guide to *nix Plumbing: on the quest for efficiency with Ruby [M|K]RI
The Ruby Guide to *nix Plumbing: on the quest for efficiency with Ruby [M|K]RI
The Ring programming language version 1.5.1 book - Part 24 of 180
The Ring programming language version 1.5.1 book - Part 24 of 180The Ring programming language version 1.5.1 book - Part 24 of 180
The Ring programming language version 1.5.1 book - Part 24 of 180
Kernel-Level Programming: Entering Ring Naught
Kernel-Level Programming: Entering Ring NaughtKernel-Level Programming: Entering Ring Naught
Kernel-Level Programming: Entering Ring Naught
Teaching linked lists data structures using MIDI
Teaching linked lists data structures using MIDITeaching linked lists data structures using MIDI
Teaching linked lists data structures using MIDI
Protocol handler in Gecko
Protocol handler in GeckoProtocol handler in Gecko
Protocol handler in Gecko
Oops pramming with examples
Oops pramming with examplesOops pramming with examples
Oops pramming with examples
Lecture no 3
Lecture no 3Lecture no 3
Lecture no 3
Talk Code
Talk CodeTalk Code
Talk Code
Erlang bootstrap course
Erlang bootstrap courseErlang bootstrap course
Erlang bootstrap course


Android *ware: Current Status and Open Problems
Android *ware: Current Status and Open ProblemsAndroid *ware: Current Status and Open Problems
Android *ware: Current Status and Open ProblemsFACE
Chasing web-based malware
Chasing web-based malwareChasing web-based malware
Chasing web-based malwareFACE
CopperDroid - On the Reconstruction of Android Apps Behaviors
CopperDroid - On the Reconstruction of Android Apps BehaviorsCopperDroid - On the Reconstruction of Android Apps Behaviors
CopperDroid - On the Reconstruction of Android Apps BehaviorsFACE
Formal, Executable Semantics of Web Languages: JavaScript and PHP
Formal, Executable Semantics of Web Languages: JavaScript and PHPFormal, Executable Semantics of Web Languages: JavaScript and PHP
Formal, Executable Semantics of Web Languages: JavaScript and PHPFACE
Infections as Abstract Symbolic Finite Automata: Formal Model and Applications
Infections as Abstract Symbolic Finite Automata: Formal Model and ApplicationsInfections as Abstract Symbolic Finite Automata: Formal Model and Applications
Infections as Abstract Symbolic Finite Automata: Formal Model and ApplicationsFACE
AndRadar: Fast Discovery of Android Applications in Alternative Markets
AndRadar: Fast Discovery of Android Applications in Alternative MarketsAndRadar: Fast Discovery of Android Applications in Alternative Markets
AndRadar: Fast Discovery of Android Applications in Alternative MarketsFACE

Destaque (6)

Android *ware: Current Status and Open Problems
Android *ware: Current Status and Open ProblemsAndroid *ware: Current Status and Open Problems
Android *ware: Current Status and Open Problems
Chasing web-based malware
Chasing web-based malwareChasing web-based malware
Chasing web-based malware
CopperDroid - On the Reconstruction of Android Apps Behaviors
CopperDroid - On the Reconstruction of Android Apps BehaviorsCopperDroid - On the Reconstruction of Android Apps Behaviors
CopperDroid - On the Reconstruction of Android Apps Behaviors
Formal, Executable Semantics of Web Languages: JavaScript and PHP
Formal, Executable Semantics of Web Languages: JavaScript and PHPFormal, Executable Semantics of Web Languages: JavaScript and PHP
Formal, Executable Semantics of Web Languages: JavaScript and PHP
Infections as Abstract Symbolic Finite Automata: Formal Model and Applications
Infections as Abstract Symbolic Finite Automata: Formal Model and ApplicationsInfections as Abstract Symbolic Finite Automata: Formal Model and Applications
Infections as Abstract Symbolic Finite Automata: Formal Model and Applications
AndRadar: Fast Discovery of Android Applications in Alternative Markets
AndRadar: Fast Discovery of Android Applications in Alternative MarketsAndRadar: Fast Discovery of Android Applications in Alternative Markets
AndRadar: Fast Discovery of Android Applications in Alternative Markets

Semelhante a How to recover malware assembly codes from memory snapshots and execution traces

Basic ASM by @binaryheadache
Basic ASM by @binaryheadacheBasic ASM by @binaryheadache
Basic ASM by @binaryheadachecamsec
X86 assembly & GDB
X86 assembly & GDBX86 assembly & GDB
X86 assembly & GDBJian-Yu Li
Creating a Fibonacci Generator in Assembly - by Willem van Ketwich
Creating a Fibonacci Generator in Assembly - by Willem van KetwichCreating a Fibonacci Generator in Assembly - by Willem van Ketwich
Creating a Fibonacci Generator in Assembly - by Willem van KetwichWillem van Ketwich
Need help in assembly. Thank you Implement the following expression .pdf
Need help in assembly. Thank you Implement the following expression .pdfNeed help in assembly. Thank you Implement the following expression .pdf
Need help in assembly. Thank you Implement the following expression .pdfrajkumarm401
Swug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainathSwug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainathDennis Chung
Buffer Overflow - Smashing the Stack
Buffer Overflow - Smashing the StackBuffer Overflow - Smashing the Stack
Buffer Overflow - Smashing the StackironSource
NDC TechTown 2023_ Return Oriented Programming an introduction.pdf
NDC TechTown 2023_ Return Oriented Programming an introduction.pdfNDC TechTown 2023_ Return Oriented Programming an introduction.pdf
NDC TechTown 2023_ Return Oriented Programming an introduction.pdfPatricia Aas
Windows debugging sisimon
Windows debugging   sisimonWindows debugging   sisimon
Windows debugging sisimonSisimon Soman
N_Asm Assembly arithmetic instructions (sol)
N_Asm Assembly arithmetic instructions (sol)N_Asm Assembly arithmetic instructions (sol)
N_Asm Assembly arithmetic instructions (sol)Selomon birhane
Introducción a Elixir
Introducción a ElixirIntroducción a Elixir
Introducción a ElixirSvet Ivantchev
Advanced procedures in assembly language Full chapter ppt
Advanced procedures in assembly language Full chapter pptAdvanced procedures in assembly language Full chapter ppt
Advanced procedures in assembly language Full chapter pptMuhammad Sikandar Mustafa
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsReverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsAsuka Nakajima
Buffer overflow – Smashing The Stack
Buffer overflow – Smashing The StackBuffer overflow – Smashing The Stack
Buffer overflow – Smashing The StackTomer Zait
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsIgor Sfiligoi
exploiting heap overflows
exploiting heap overflowsexploiting heap overflows
exploiting heap overflowsprimelude
Learn a language : LISP
Learn a language : LISPLearn a language : LISP
Learn a language : LISPDevnology

Semelhante a How to recover malware assembly codes from memory snapshots and execution traces (20)

Virtual machine re building
Virtual machine re buildingVirtual machine re building
Virtual machine re building
Basic ASM by @binaryheadache
Basic ASM by @binaryheadacheBasic ASM by @binaryheadache
Basic ASM by @binaryheadache
X86 assembly & GDB
X86 assembly & GDBX86 assembly & GDB
X86 assembly & GDB
Creating a Fibonacci Generator in Assembly - by Willem van Ketwich
Creating a Fibonacci Generator in Assembly - by Willem van KetwichCreating a Fibonacci Generator in Assembly - by Willem van Ketwich
Creating a Fibonacci Generator in Assembly - by Willem van Ketwich
The Stack and Buffer Overflows
The Stack and Buffer OverflowsThe Stack and Buffer Overflows
The Stack and Buffer Overflows
Need help in assembly. Thank you Implement the following expression .pdf
Need help in assembly. Thank you Implement the following expression .pdfNeed help in assembly. Thank you Implement the following expression .pdf
Need help in assembly. Thank you Implement the following expression .pdf
Swug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainathSwug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainath
Buffer Overflow - Smashing the Stack
Buffer Overflow - Smashing the StackBuffer Overflow - Smashing the Stack
Buffer Overflow - Smashing the Stack
NDC TechTown 2023_ Return Oriented Programming an introduction.pdf
NDC TechTown 2023_ Return Oriented Programming an introduction.pdfNDC TechTown 2023_ Return Oriented Programming an introduction.pdf
NDC TechTown 2023_ Return Oriented Programming an introduction.pdf
Windows debugging sisimon
Windows debugging   sisimonWindows debugging   sisimon
Windows debugging sisimon
N_Asm Assembly arithmetic instructions (sol)
N_Asm Assembly arithmetic instructions (sol)N_Asm Assembly arithmetic instructions (sol)
N_Asm Assembly arithmetic instructions (sol)
Introducción a Elixir
Introducción a ElixirIntroducción a Elixir
Introducción a Elixir
Advanced procedures in assembly language Full chapter ppt
Advanced procedures in assembly language Full chapter pptAdvanced procedures in assembly language Full chapter ppt
Advanced procedures in assembly language Full chapter ppt
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsReverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
Buffer overflow – Smashing The Stack
Buffer overflow – Smashing The StackBuffer overflow – Smashing The Stack
Buffer overflow – Smashing The Stack
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUs
exploiting heap overflows
exploiting heap overflowsexploiting heap overflows
exploiting heap overflows
Learn a language : LISP
Learn a language : LISPLearn a language : LISP
Learn a language : LISP
Tutorial 2
Tutorial     2Tutorial     2
Tutorial 2


Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxRosabel UA
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543

Último (20)

Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptx
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)

How to recover malware assembly codes from memory snapshots and execution traces

  • 1. How to recover malware assembly codes Jean-Yves Marion LORIA !
  • 2. Jean-Yves Marion - Laboratoire de Haute Sécurité (LHS) Duqu : The precursor to the next Stuxnet Duqu is Targeted attacks Start in June 2010 ? Discover in Sept. 2011 by Crysys (Budapest) See white paper of Symantec Code injection Duqu is similar to Stuxnet ➡ Same installation mechanisms and Similar functionalities ➡ But Anti-Virus companies detect it in Sept 2011 ! ➡ None of 43 anti-virus of VirusTotal was able to detect Duqu knowing Stuxnet. 0-day exploit Driver File (.sys) Installer (.dll) Decryption DUQU Main DLL Service.exe
  • 3. The decryption routine! of the payload installer Unpack a UPX-file The main DLL code is ! now decrypted ! and depacked ! in memory only Wave 1 Wave 2 Decryption Duqu is a self-modifying program
  • 4. A common protection scheme for malware Wave 1 payload Decrypt .......... Decrypt P33C7 18+&01223 Decrypt Wave 2 Self-modifying program schema
  • 5. Self-modifying codes A bare semantics µ0[c] : binary c loaded into memory µ0 µn : memory n : registers n(ip) returns the address of the next instruction to run Traces ➡ Traces are obtained by code instrumentation : we use Pin (intel) We collect an execution trace of P : For each run instruction, we gather – its memory address – its machine instruction (µ0[c], 0) ! (µ1, 1) ! . . . ! (µn, n) ! . . .
  • 6. Self-modifying codes Dynamic typing of memories (m) = (kr, kw, kx) where m is a memory adress kw is the writing level kr is the reading level kx is the execution level 0(m) = (0, 0, 0) (µ0[c], 0, 0) ! (µ1, 1, 1) ! . . . ! (µn n, n) ! . . . The execution level is the level of n(ip) given by n( n(ip))
  • 7. Self-modifying codes Dynamic typing of memories (m) = (kr, kw, kx) where m is a memory adress kw is the writing level kx is the execution level 0(m) = (0, 0, 0) (µ0[c], 0, 0) ! (µ1, 1, 1) ! . . . ! (µn n, n) ! . . . An instruction written at level k has an execution level of k+1 @a: mov esi,$index @b: xor [@offset+esi],$key @c: sub esi,4 @d: jnz @b @offset: [encrypted data] Wave 1 @a,…,@d Decrypt Wave 2 @offset kw is 1 The execution level of @offset is 2 because it is written by instructions in wave 1 So kx is 2
  • 8. Self-modifying codes kw is the writing level kx is the execution level A self-modifying program c is a program such that its execution level is > 1 for an input i+1(m) = 8 >< >: (kr, k + 1, kx) if m is written and i(m) = (kr, kw, kx) i(m) otherwise i+1(m) = ( (kr, k, k + 1) m = (ip) & i( i(ip)) = (kr, k, kx) i(m) otherwise Similar to Phase semantics of Preda, Giacobazzi, and Debray The execution level is k + 1 i(ip) points to an instruction like mov [m],eax
  • 9. Packer protections Exemple (4/5) • hostname pack´e avec Themida Different code waves with their relations Themida packer Yoda packer UPX
  • 10. Fig. 7.2: Résultats de l’analyse Nom du binaire k k Blind Decrypt Check Scrambled hostname.exe (original) 1 1 !EPack_1..exe 2 2 X X acprotect-hostname.exe 18 882 X X X X aspack-hostname.exe 2 3 X X X X enigma_protector_1.16.exe 5 24 X X X X exefog_1.1.exe 3 5 X X X expressor-hostname.exe 2 3 X X fsg.exe 2 2 X X X mew11.exe 2 2 X X X molebox-hostname.exe 3 5 X X X X morphine_1.9.exe 3 3 X X X nakedpack.exe 2 2 X npack-hostname.exe 2 2 X nspack.exe 3 4 X X X packman_1..exe 2 2 X X X pec2-hostname.exe 3 4 X X X X pelock-hostname.exe 9 16 X X X X pepack.exe 1 1 X pespin-hostname.exe 4 38 X X X X petite.exe 2 2 X X rlpack_1.17_full_version.exe 2 2 X X X rlpack-hostname.exe 2 2 X telock_.98.exe 2 2 X themida_1.8.5.2.exe 11 164 X X X X upx-hostname.exe 2 2 X X vmprotect-hostname.exe 1 1 X winupack-hostname.exe 3 4 X X X X Yodas_Crypter_v1.3.exe 4 4 X X X X yp-1.02-hostname.exe 4 6 X X X X Légende :
  • 11. Where are we ? (µ0[c], 0, 0) ! (µ1, 1, 1) ! . . . ! (µn, n, n) ! . . . Dynamic typed memory trace which defines a sequence of waves Wave 1 Decrypt .......... DecryptDecrypt Wave 2 Wave K Can we recover the assembly code of the wave K ? Can we reconstruct the full CFG ? The inputs: An execution trace inside wave K The snapshot of the memory at wave K
  • 12. The problem and its inputs Can we recover the assembly code of this wave ? The inputs: An execution trace inside wave K The snapshot of the memory at wave K Snapshot of the memory at the beginning of wave 5
  • 13. Dynamic vs Static analysis A trace obtained by dynamic analysis Dynamic typed memory trace Undiscovered code in white boxes
  • 14. Why is it difficult to recover a CFG in x86 ?
  • 15. Indirect jumps 100: jmp eax – Fuzzing ! – We need to have a robust approximation of x86 semantics! – Abstract interpretation! – SMT Solver What is the set of possible values of eax ?
  • 16. Junk code insertion Junk code insertion at the expected return adress ! ! 100 : call @a junk code @a : … How to determine the return address of a call ? 125 : pop esi Modify the return address (125) See Debray’s paper
  • 17. Yet another difficulty mis-alignment 01006 e7a f e 04 0b inc byte [ ebx+ecx ] 01006 e7d eb f f jmp +1 01006 e7e f f c9 dec ecx 01006 e80 7 f e6 jg 01006 e68 01006 e82 8b c1 mov eax , ecx Figure 1. Overlapping assembly in tELock. 010059 f0 89 f9 mov ecx , edi ,=< 010059 f2 79 07 jns +9 | 010059 f4 0 f b7 07 movzx eax , word [ edi ] | 010059 f7 47 inc edi | 010059 f8 50 push eax | 010059 f9 47 inc edi | 010059 fa b9 57 48 f2 ae mov ecx , aef24857 ‘ > 010059 fb 57 push edi 010059 f c 48 dec eax 010059 fd f2 ae repne scasb 010059 f f 55 push ebp Figure 2. Overlapping assembly in UPX. 2.2.1 tELock0.99 tELock0.99 uses an overlapping technique to simply obfuscate the code as follows. Figure 1 sh disassembly taken from the address 01006e7a. There is a jmp +1 instruction at address 01006e7 teLock Figure 2. Overlapping assembly in UPX. 2.2.1 tELock0.99 tELock0.99 uses an overlapping technique to simply obfuscate the code as follows. Figure 1 shows a recursive disassembly taken from the address 01006e7a. There is a jmp +1 instruction at address 01006e7d and coded on the two bytes eb ff, that jumps to the address 01006e7d + 1, which is a dec ecx instruction (ff c9) which shares the byte ff at address 01006e7d + 1 with the jmp instruction. 2.2.2 UPX UPX uses overlapping to optimize the size of the final packed binary (figure 2). The unpacker part uses a conditional jump to separate the control flow into two overlapping blocks which both realign after a few instructions. (TODO: expliquer les deux branches, rapidement en quoi elles sont utiles) 2.2.3 Overlapping in state-of-the-art disassemblers Existing disassemblers, even when doing recursive traversal, assume that code cannot overlap and fail at displaying the resulting disassembly. With IDA Pro (v6.3), the tELock example looks as follows: 01006E7A inc byte ptr [ ebx+ecx ] 01006E7D jmp short near ptr loc_1006E7D+1 01006E7D ; 01006E7F db 0C9h ; 01006E80 db 7Fh ; 01006E81 db 0E6h ; 01006E82 db 8Bh ; 01006E83 db 0C1h ; With Radare (TODO: recursive?), the tELock example is disassembled as follows: 01006 e7a fe040b inc byte [ ebx+ecx ] 01006 e7d e b f f jmp 6 e7e 01006 e7f c9 leave 01006 e80 7 fe6 jg 6e68 01006 e82 8bc1 mov eax , ecx Both are not able to follow the jmp: the target of the jmp is already disassembled in another assembly instruction and is thus deemed invalid. 2 IDA fails because of jmp +1 BB [0x4 -> 0x5] (0x2) 0x4 dec ecx BB [0x3 -> 0x4] (0x2) 0x3 jmp 0x4 BB [0x6 -> 0x7] (0x2) 0x6 jg 0x ee BB [0x0 -> 0x2] (0x3) 0x0 inc byte [ebx+ecx] BB [0x8 -> 0x9] (0x2) 0x8 mov eax, ecx Figure 4. Control flow graph for the tELock sample
  • 18. Another example of mis-alignment 01006 e7d eb f f jmp +1 01006 e7e f f c9 dec ecx 01006 e80 7 f e6 jg 01006 e68 01006 e82 8b c1 mov eax , ecx Figure 1. Overlapping assembly in tELock. 010059 f0 89 f9 mov ecx , edi ,=< 010059 f2 79 07 jns +9 | 010059 f4 0 f b7 07 movzx eax , word [ edi ] | 010059 f7 47 inc edi | 010059 f8 50 push eax | 010059 f9 47 inc edi | 010059 fa b9 57 48 f2 ae mov ecx , aef24857 ‘ > 010059 fb 57 push edi 010059 f c 48 dec eax 010059 fd f2 ae repne scasb 010059 f f 55 push ebp Figure 2. Overlapping assembly in UPX. 2.2.1 tELock0.99 tELock0.99 uses an overlapping technique to simply obfuscate the code as follows. Figure 1 shows a recurs disassembly taken from the address 01006e7a. There is a jmp +1 instruction at address 01006e7d and coded the two bytes eb ff, that jumps to the address 01006e7d + 1, which is a dec ecx instruction (ff c9) which shares byte ff at address 01006e7d + 1 with the jmp instruction. 2.2.2 UPX UPX uses overlapping to optimize the size of the final packed binary (figure 2). The unpacker part uses a conditio jump to separate the control flow into two overlapping blocks which both realign after a few instructions. (TODO: expliquer les deux branches, rapidement en quoi elles sont utiles) UPX Re-synchronized bytes in common !! mov ecx,edi jnz +9 movzx eax, [edi] inc edi push eax inc edi mov ecx, aef24857 push ebp push edu dec eax repine scasb Share 4 bytes
  • 19. Let’s recap the problem First     instruc*on Last     instruc*on TRACER W2 W4W1 W3 W5 Snapshot of the memory at the beginning of wave 5 Goal : Reconstruct the full CFG
  • 20. Problem inputs Snapshot of the memory at the beginning of wave 5 An execution trace A path in the woods
  • 21. Junk codes insertion after a call 100 : call @a junk code 125 : pop esi @a : pop ebp Modify the return address @b: ret 100:call @a, @a:pop esi,…, @b:ret;125:pop esi;… A trace will provide automatically the address 125 It is junk codes only if it is not reachable Trace: See the paper of Krugel and al, Usenix 2004 for another approach
  • 22. Method for mis alignment … 89 F9 79 07 0F B7 07 47 50 47 B9 57 48 F2 AE 55 … mov ecx,edi jnz +9 push edi dec eax repne scasb push ebp movzx eax, [edi] inc edi push eax inc edi mov ecx, aef24857 push ebp An obfuscation similar to UPX The CFG construction follow the trace Then, we search for missing codes 3/ We split blocks
  • 23. Method for mis alignment … 89 F9 79 07 0F B7 07 47 50 47 B9 57 48 F2 AE 55 … mov ecx,edi jnz +9 push edi dec eax repne scasb movzx eax, [edi] inc edi push eax inc edi mov ecx, aef24857 push ebp An obfuscation similar to UPX 1/ The CFF construction follows the trace 2/ Then, we search for missing codes 3/ We split blocks Misalignment can come from indirect jump … traces are then useful !
  • 24. The overall method (work in progress) • A partial CFG is an un-complete CFG • Two partial CFG are in conflict if there are two mis-aligned instructions. • Traces define a set of partial CFG which are in conflict. mov ecx,edi jnz +9 movzx eax, [edi] inc edi push eax inc edi mov ecx, aef24857 push edi dec eax repne scasb push ebp push ebp push ebp Share the ! same adresses • Edges between partial CFG indicate mis-alignement • Then we can synchronize partial CFG • There are orphan partial CFG • There are ok if there is an edge to a valid address • Statistic recognition is useful at this stage
  • 25. Conclusion and Questions • We develop a disassembler for self-modified codes : BinVizz : Visualization of each code wave from a trace