Miletti Gabriela_Vision Plan for artist Jahzel.pdf
2023-02-22_Tiberti_CyberX.pdf
1. Walter Tiberti, Ph.D
Static and Dynamic Analysis:
Reversing Techniques and Tools
Dipartimento di Ingegneria, scienze dell'informazione e matematica
Università degli Studi dell’Aquila / Italy
v.1.1
3. 3
3
Walter Tiberti, Ph.D – walter.tiberti@univaq.it
Researcher (R/TDa) ING-INF/03 @ DISIM Dept. / UNIVAQ
ICT Security course teacher
Topics: Net/SW/HW Security, RAN security, Cryptographic impl. for
embedded systems/FPGA ...etc.
15+ years of experience in Reversing across multiple platforms
From a time when ...
Launching a debugger meant freezing Windows
Processes had to be dumped a page at a time
People had "fun" rewriting the ISRs ☺
Who am I
4. 4
4
In Software Security, the term "Static Analysis" refers to the techniques used
to analyse a binary file (e.g., executable, objects, libraries etc.) statically –
i.e., without running it
Among those techniques, the term "Reverse Engineering" (or just Reversing) refers to
the static analysis of the code of the binary with the objective of retrieving the logic of
the application/library (i.e., ideally retrieving some kind of high-level code)
On the other side, the term "Dynamic Analysis" refers to the techniques
used to analyse a binary file at runtime
Two main techniques:
Tracing: running the application while retrieving data (e.g., function calls)
Debugging: running (or attaching) an application in a controllable environment which allow
to monitor all the details (down to e.g., CPU registers and memory contents) or even stop the
execution upon a condition for further analysis variables and memory
Introduction
6. 6
6
Types of programming languages:
Compiled – you write the source code and use a compiler toolchain to obtain
an executable file containing data, metadata and assembly code (encoded as
raw bytes) for the target HW platform
C, C++, Rust are examples of compiled programming languages
Interpreted – you write the code and use an interpreter (program) which read
the source code line by line executing the statements.
Python, Perl, PHP
Mixed – you write the source code, use a compiler to generate intermediate
files which get interpreted by an interpreter
Java
Context – from code to binary (recap)
7. 7
7
Source code (e.g., file1.c )
Pre-processing by pre-processor: file1.c → file1_preproc.c
Compiling by compiler core: file1_preproc.c → file1.s (assembly file)
Assembling by assembler: file1.s → file1.o (object file)
....
Linking by linker: file1.o + file2.o + lib2.a → file1 (ELF executable)
From executable to process
Loading by loader: file1 → process
Context – from code to binary (recap)
8. 8
8
From the static/dynamic analysis point-of-view:
Compiled – you have executable files, the data and the assembly code
Interpreted – you have the source files (!!)
Mixed – you have "intermediate files"
For interpreted languages, the static analysis is "just" about reading and
understanding the code
For compiled languages, you have to read and understand the assembly
language (reversing). Also you can retrieve data and metadata used by the
code by examining the rest of the executable file
For mixed languages, you have to read/understand the "intermediate
language" BUT most of the time there is another way (details later)
Context – from code to binary (recap)
9. 9
9
Working with assembly language means that a good knowledge of the
underlying hardware and software is required.
Keep in mind that:
Different CPU architect. = different assembly (e.g., x86_64 vs ARM)
Different OS = different executable format (e.g., PE vs. ELF vs. MACH-O)
Different OS = different toolschains and tools
Different OS/HW = different ABI and function calling conventions
....
Different HW, different SW, different approaches
10. 10
10
FAQ: do I need to know assembly language?
Short answer: in general, yes.
If you plan to become a fulltime professional reverser, you will have to learn the assembly
language and their usage across multiple platforms.
Longer answer: for a specific case, it depends on the target executable
format/language/etc.
Is any additional knowledge required?
Short answer: yes
Longer answer: multi-disciplinary knowledge is almost always necessary, but that's
how reversers learn! Find a topic you know nothing about → learn it.
We are in 2023 and Google/ChatGPT may give you just the pieces of information
required in case you stumble upon a unknow topic
Assembly? Required Knowledge?
11. 11
11
Case 1: x86_64 (intel), Linux, C language
x86_64 assembly (CISC)
ELF64 file format
Fastcall convention (i.e., first 6
pointer/integer arguments are passed via
registers)
Case 2: Android, Java
Java bytecode
Jar file (zip) with .class files + metadata
Case 3: MSP430, no OS, C language
(embedded platform)
MSP430 16-bit MCU assembly (RISC, <50
instructions)
ELF or baremetal code+data
Some examples
Case 4: Raspberry PI, ARM64, Linux
ARM64 (called AArch64) assembly,
RISC
ELF64
Case 5: x86, Windows (32 bit), C++
x86 (32 bit) assembly (CISC)
PE format
Cdecl, stdcall or thiscall depending
on the library/function/OOP
Case 6: x86_64, Windows, C#.NET
MSIL intermediate lang.
PE, PE64, PE+
12. 12
12
The basic tools for performing static analysis are the following:
A disassembler: translates raw bytes from code sections into "human-readable" assembly
instructions. Advanced disassemblers also perform code flow analysis, de-reference
symbols/jump/call locations, recognize function, variables, types etc.
Executable analyzer: parse the data structures and metadata contained in the executable file
format.
Various tools for stringsearches, signature scan,symbols analysis, compiler identification etc.
For compiled software, it is extremely complex to automatically translate back the
assembly code into high-level code (i.e., Decompilation) but in some cases those
tools can provide a very useful (yet imprecise) result
Lastly, there exists some disasseblers/tools focused on specific languages and
frameworks. Those tools tend to work better with respect to generic disassemblers
Examples: DeDe, JD-GUI/fernflower/lyuten, ILDasm/dotPeek, etc.
Tool set for static analysis
17. 17
17
file scan the target file content and tries to determine the file type
ldd displays the shared libraries required by the target executable
Note: the executable may load additional dynamic libraries at runtime using
dlopen etc.
Other Linux static analysis tools: file, ldd
18. 18
18
nm displays the
symbols defined inside
the executable symbol
table
Note 1: dynamic
symbols can be shown
with the –D flag
Note 2: the symbols
may be absent if the
executable is stripped!
Other Linux static analysis tools: nm
19. 19
19
objdump –Mintel –d
Useful when:
No disassembler available for a given HW platform
Speed is needed
Further processing (stdout piping)
Not just disassembling – see man objdump
Other Linux static analysis tools: objdump
20. 20
20
strings searches for all the
sequences of at least n (default:
4) human-readable ASCII
characters
Useful for:
Checking for "hardcoded"
password
Locating useful code section by
cross-referencing suspicious
strings
Other Linux static analysis tools: strings
21. 21
21
readelf parses and shows the
ELF data structures inside a
binary file
Useful for looking at:
Section/Segment mapping and
permissions
Code entrypoints
Symbols, Relocations, PLT/GOT
INIT/FINI arrays
Other Linux static analysis tools: readelf
23. 23
23
Elements which are useful during static analyses:
Strings
Imported/Exported functions
Resources (Windows)
TLS functions
Not just code
24. 24
24
Virtual Memory: technique to dynamically
manage physical memory.
(Virtual) Address Space: ranges of virtual
addresses reachable by a process
Make sure to understand those concepts. If
you have some dubts, please refer to your
computer architecture book/course
Recap on Computer Architecture elements
25. 25
25
The stack is a "special" memory area which
is used as LIFO data structure for storing
temporary data, arguments, variables etc.
A specific CPU register is used to store a
pointer to the top element of the stack
x86: ESP, x86_64: RSP, ARM=SP
The stack grows from higher addresses
towards lower addresses
Allocation on the stack is just a subtraction; de-
allocation is an addition (no cleanup usually)
Some architectures use another CPU register to
store the base of the piece of stack for the
current function (a.k.a. stack frame)
Stack
26. 26
26
"Heap" in this context is a generic term to
indicate the dynamically allocated memory
Malloc, C++'s new etc.
Heap memory is allocated by the OS upon
requests from applications
Different OSes use different ways to manage
heap:
Linux: libc malloc allocator (ptmalloc2 etc.)
Windows: Virtual/Local/GlobalAlloc, pools etc.
One important aspect: in most architectures,
memory is allocated with a page-size granularity!
(e.g., 4 KB)
Heap
27. 27
27
Calling Convention = how function calls are (and should be) performed at assembly level
Where to store function arguments?
Where to store local variables? Etc.
https://en.wikipedia.org/wiki/Calling_convention
All the details on HOW data and code are represented is the ABI – application binary interface
Common calling conventions:
Cdecl: arguments pushed in the stack last-to-first, the caller cleans the stack
stdcall: arguments pushed in the stack last-to-first, the callee cleans the stack
fastcall: first n pointer-sized arguments stored in a pre-determined set of registers, the rest on stack/fp
registers
thiscall: one register holds the object instance pointer (this)
ABI and Calling Conventions (briefly)
30. 30
30
One of the most important skill to acquire to become a better
"reverser" is to be able (and be efficient) to distinguish the code that
needs a detailed analysis and the code that may be skimmed through
Reversing an executable takes time and mental energies!
A methodological approach and good tools may help!
In other words, keep the "focused" analysis surfaceat minimum!
Some example:
Avoid analysing all the functions/instructions etc.
A carefull and wise hypothesis on a function behavios may save you hours!
Reversing: approach
31. 31
31
Example: an executable asking you for a secret password
First objective: try to understand to restrict the analysis only to the password
check code
How?
Try to look for message strings (e.g., "Please insert the password") and get the
offset/virtual address
Using the disassembler, search for all the cross-references to that address – probably
you will find some kind of a printing function
In the surrounding of the printing function:
Maybe there is a reading function that waits for user input and saves the user password
(in the simplest case) a comparison between the expected and provided passworda
A basic technique: from strings to code
32. 32
32
What if...
There is no trace of message strings?
Maybe encryption is involved...look for the printing functions and trace back from there
The password compared is not what I entered!?
Often, instead of comparing s1 == s2, it is common to compare H(s1) == H(s2) where H
is a string manipulation function
I see no call to library functions
Maybe the program performs dynamic symbol resolutions , just implements the function
code itself or uses some kind of obfuscation/trick
Variations
33. 33
33
Executables may contain different anti-analysis tricks. For example:
Stripped executables: all non-essential data/metadata is removed from the
executable (e.g., symbols)
Self-Modify Code : the code gets modified/rewrote at runtime!
Obfuscation: junk code or misleading information added to make the analysis
harder
Instruction-decoding tricks : to confuse the disassembler
Packers/Cryptors: the executable is packed/encrypted and
unpacked/decrypted at runtime
Etc.
Anti-reversing techniques
34. 34
34
As stated, reversing an executable take time!
Often, too much!
Alternatives:
Symbolic Execution – idea: treat the executable like a "mathematical
function", given a context and the wanted output, process all the possible
code paths to automatically retrieve the inputs/conditions/etc. required
Fuzzing – idea: if your objecting is to find vulnerabilities, the big part of the
work may be performed by finding all the input combinations that somehow
"crash" the executable, so that (later) you can focus your analysis on those.
Alternative to reversing
37. 37
37
With dynamic analysis we aim to analyse the executable (or, better, the
process) behavior at runtime
Useful, because, e.g., tracking register and memory content with only
static analysis is unfeasible
To perform dynamic analysis, we use:
Debuggers
Tracers
Injectiors/Dynamic Instrumentation frameworks
Dynamic Analysis
38. 38
38
While the "ordinary" use of a debugger is to help developers to detect bugs,
in dynamic analysis the debugger represent the main tool for inspecting a
process execution in details
A debugger, usually, allow to:
Launch a target process or attach to an existing one
Inspect memory, CPU registers and everything related to the process
Pause execution when the execution reaches an instruction, upon a condition or
when there is a specific memory access (breakpoint, watchpoint, HW breakpoints
etc.)
Edit memory, registers etc.
Execute 1 instruction at a time (i.e., stepping)
Debugger
39. 39
39
A (software) breakpoint is a "mark" that can be set on a specific
address/instruction of the process.
When the execution reaches the breakpoint, the debugger STOPS the execution
and wait for user input
In this way, it is possible to inspect the current status of memory/instructions/registers etc.
Often implemented as software interrupts (e.g., INT 3)
Then, execution may be resumed, restarted, etc.
Special breakpoint types:
Hardwarebreakpoints: based on the CPU capabilities, useful to "break" on memory access or
to avoid some software breakpoint issues
Watchpoint: breakpoints that triggers when the data at a specific address changes
Debugger: breakpoint
40. 40
40
"Ring 3" debuggers:
Source-level debuggers: focus on bug-hunting, shows debugging symbols,
high-level language lines, etc.
Low-level debuggers: focus on the actual assembly code being executed, low-
level memory access, CPU registers etc.
"Ring 0" debuggers:
Kernel Debuggers: focus on bug-hunting on kernel code, data structure etc.
Different debuggers for different contexts
41. 41
41
When the debugger is instructed to launch a new process, it stops at
the first instruction (entrypoint) or at the first instruction of the main()
function
The process is called "debuggee". The debugger is its parent process.
The debugger can also intercept (attach) an already-running process,
pausing its execution and waiting for user commands
Not always possible – in Linux, by default, a process may be debugger only by
a parent process (see YAMA/PTRACE_SCOPE)
Important note:
Only one debugger can be active on a process!
Use of debugger: starting/attaching
42. 42
42
Another important aspect is that modern applications:
Can launch child processes
Can have multiple threads
Those aspects pose various challenges when debugging an application. For
example:
How to debug only one thread? And when the execution is stopped, what happens in other
threads? And in case of locking? Race conditions?
What happen when a process fork()s? Which is the process followed by the debugger? Parent
or child?
What happen when a process launch a new process (exec) ?
Etc.
In general, multi-threaded application require a higher level of skill to be analysed
at debugger level
Debugging: multiple process/threads
43. 43
43
What about embedded platforms? Three cases:
The platform can run an OS and a debugger (e.g., Raspberry PI)
No OS but a debuggingserver
Baremetal / JTAG
In the last 2, a technique called "remote debugging" has to be used
The idea is to debug the application running on the platfom (server) with a
separate computer (client) using network or serial connections
To use GDB → gdbserver (on the embedded platform) + gdb (on pc)
Relevant projects: OpenOCD: https://openocd.org/
Debugging: remote debugging
44. 44
44
Suppose you have a x86_64 laptop and you want to debug an ELF
executable compled for ARM...how can you do that?
You cannot run it natively...
Two solutions:
1) Use a physical ARM platform
2) Use an emulator
Relevant project: QEMU - https://www.qemu.org/
Cross-platform debugging
46. 46
46
Debugger:
Linux: gdb /kgdb
Windows: ollydbg, x64dbg, immunitydbg, WinDBG, etc.
(some disassemblers support also debugging, e.g., IDA, Ghidra, Radare2)
Tracers:
Linux: strace (syscalls) and ltrace (library calls)
Windows: (more rare, debuggers used to trace)
Dyn. Instrumentations
Frida
Dynamic Analysis: tool selection
47. 47
47
The de-facto userland debugger for Linux
Powerful, but has a non-trivial learning
curve
Used with plugins that enhance the
experience
GEF: https://hugsy.github.io/gef/
PWNDBG:
https://github.com/pwndbg/pwndbg
GDB documentation:
https://www.sourceware.org/gdb/
GDB
48. 48
48
One of the state-of-the-art debugger for Windows
Similar to OllyDBG (an old but extremely famous debugger)
https://x64dbg.com/
x64dbg
49. 49
49
"The" debugger for Windows, both
userspace/kernel level:
https://learn.microsoft.com/en-
us/windows-
hardware/drivers/debugger/debugger-
download-tools
Recently, it has got a restyle:
https://apps.microsoft.com/store/detai
l/windbg-
preview/9PGJGD53TN86?hl=it-
it&gl=it&rtc=1
WinDBG
50. 50
50
strace
Launch a program and logs all the Linux system calls executed with arguments
and return values
ltrace
Similar to strace, but instead logs the library functions executed
Works well with (only) the C library
Frida
A dynamic instrumentation tool that lets you inject javascript code and trace,
hook functions
https://frida.re/
Tracers (Linux) and Dyn. Instrumentation
51. 51
51
Process monitoring (more in the next slide)
Processes
Files/Registry Keys
Sockets/Pipes/FIFO etc.
eBPF ☺
Other dynamic analysis tools
52. 52
52
Linux:
Basic commands: ps, top
/proc filesystem
pspy32/pspy64
lsof, ss, tcpdump, wireshark
Windows:
Task manager / tasklist
Process Explorer - https://learn.microsoft.com/en-
us/sysinternals/downloads/process-explorer
Filemon/Regmon/Procmon - https://learn.microsoft.com/en-us/sysinternals/
Process Monitoring tools
55. 55
55
Risorse generiche:
"Nightmare" – introduzione al reversing/exploitation tramite challenge prese da CTF
vari:
https://guyinatuxedo.github.io/
Binary Auditor – (guided) challenges on binary/malware analysis on Windows
https://github.com/Info-security/binary-auditing-training
Pwnables – challenges on reversing+exploitation
https://pwnable.kr/
https://pwnable.tw/
Microcorruption – reversing+exploitation on MSP430-based environment
https://microcorruption.com/
Backup slides