2. Problems
• Software is complex
– Large codebase
– Interaction between components
– Components from different vendor
– Closed source, closed API
• Why understand software?
– As developer => less bugs
– As administrator => diagnosis
– Curiosity?
• Execution trace contains software behavior
information, but it’s huge.
14/Sep/2011 NUS SoC CSTalks 2
3. Software Traces
• Types of traces
– Instruction trace: records machine instructions
– Call trace: records function calls
– System call trace: records system calls
– Software logs: important events
• System trace
– System call trace from all processes
– Mainly resource usage, system & process
interaction
14/Sep/2011 NUS SoC CSTalks 3
4. WinResMon
• WinResMon: our trace recorder.
• Works in Windows
• Types of events:
– File: open, read, write, close, rename, …
– Registry: open, get value, set value, delete, …
– Network: connect, listen, send, receive, …
– Process/thread: create, terminate.
14/Sep/2011 NUS SoC CSTalks 4
5. Information (fields) in an Event
• PID/TID Process/thread ID
• Program name Path of program’s EXE
• User name/group Process’ owner
• Start/end time Event timing in CPU ticks
• Operation type E.g. file open
• Parameter Type dependent. E.g.
– file path, system call flags, registry path
– IP address
• Call stack trace Call stack in user process
14/Sep/2011 NUS SoC CSTalks 5
6. Why visualize System Traces
• Software is complex
– Interaction between modules, other software
• Software can be closed source, but interaction
is open
• Human is good at detecting
– Repeated pattern
– Anomaly
14/Sep/2011 NUS SoC CSTalks 6
7. What is DotPlot?
Trace X
E A C B E E E D C
A
C
B
C
D
E
Trace Y
B
C
E
14/Sep/2011 NUS SoC CSTalks 7
8. What is DotPlot?
Trace X
E A C B E E E D C
A
C
B
C
D
E
Trace Y
B
C
E
14/Sep/2011 NUS SoC CSTalks 8
9. An Example
Visualization
comparing:
MS PowerPoint,
MS Word,
OO Word, and
OO PowerPoint.
14/Sep/2011 NUS SoC CSTalks 9
11. Extended DotPlot
• Matching Rule
– Define whether two events match
– By fields: e.g. “if PIDs and resource paths are
the same”, “if program names are the same”
• DP Coloring Rule
– Define color for matched events
– Traditional DP uses black only
– Use RGB model on black background, CMY
on white background
– Use regular expression to specify events
– E.g. “.*file_open.*”→blue. “.*reg_.*”→cyan
14/Sep/2011 NUS SoC CSTalks 11
12. Event-ordered and Time-ordered
• Each event takes different time
• The meaning/unit of each axis
Event-ordered Time-ordered
14/Sep/2011 NUS SoC CSTalks 12
13. Axis Histogram
– Ticks mark unit time (e.g. 1 second)
– Histogram
• Event density (time-ordered)
• Time spent (event-ordered)
14/Sep/2011 NUS SoC CSTalks 13
14. Barcode
• One dimensional
• Highlight user chosen events
• E.g. file_open → red
• One or more (e.g. three below)
• Barcode coloring rules
14/Sep/2011 NUS SoC CSTalks 14
15. Example 1: File Copying
Self-comparison, event-ordered
xcopy copying 8 files: 1MB,
10KB, 10MB, 100KB, 1MB, 10KB,
10MB and 100KB
DP match : operation + parameter
(pathname)
DP color : magenta → source; cyan
→ destination; black → other
File Operation
Source/Dst File Operation
Registry Operation
14/Sep/2011 NUS SoC CSTalks 15
16. File Size
File size is visible
Two 1MB and 10MB are
shown
Two 10KB and two 100KB
are visible only when
zoomed in
14/Sep/2011 NUS SoC CSTalks 16
17. Zooming in
DP color : magenta → source;
cyan → destination; black →
other
14/Sep/2011 NUS SoC CSTalks 17
18. A Surprise: Registry Operations
So many registry operations for
a console application
Registry Operation
14/Sep/2011 NUS SoC CSTalks 18
19. Another Surprise: DLLs
DLLs
File, but not source or
destination.
Time on DLLs is more
than a 1MB file.
File Operation
Source/Dst File Operation
14/Sep/2011 NUS SoC CSTalks 19
20. Example 2: Software Build
X: succeed; Y: failed due to X: succeed
missing .c file
DP match : program + operation
+ value (pathname)
Y: Failed due to missing .c file
DP color : black → any
Bar1 color : black → nmake.exe
Bar2 color : cyan → cl.exe;
magenta → link.exe
Bar3 color : cyan → reading .c
files; magenta → reading .h files
14/Sep/2011 NUS SoC CSTalks 20
21. Number of Executions
X: 4 compiles (cl.exe), 1 link
(link.exe)
Y: 3 compiles, 0 link
Y: 3 compiler, 0 linker
Y: Third compile doesn’t read
.c or .h.
Bar2 color : cyan → cl.exe;
magenta → link.exe
Bar3 color : cyan → reading .c X: 4 compiler, 1 linker
files; magenta → reading .h
files
14/Sep/2011 NUS SoC CSTalks 21
22. Similarity & Difference
Two traces are similar.
Y (failed) trace
terminates earlier.
Right before reading .c
file
14/Sep/2011 NUS SoC CSTalks 22
27. R1: Windows Update
• Similar events (darker
area) are by Windows
Auto Updater
• More file operation,
less registry operation
magenta → wuauclt.exe (Windows Update)
File Operation
Registry Operation
14/Sep/2011 NUS SoC CSTalks 27
29. Visualizing Module Dependencies
• The problem
– There’s vulnerability in X. Which software uses X?
– Why my software uses X? I never call it.
– Is it safe to uninstall X?
• Software module
– Windows DLLs
– UNIX .so
– Java class, packages
14/Sep/2011 NUS SoC CSTalks 29
33. Binary Dependency Visualization
• Two types of nodes: EXE, DLL + etc
• Three types of directed edges
1. EXE X launches another EXE Y
2. EXE X load a DLL Y
3. A function in binary X calls a function in binary Y
• How are binaries shared among programs?
– EXE Dependency Graph
– Only Type 1 and 2 edge
– Group DLLs by loader
• How binaries interact?
– DLL Dependency Graph
– Only Type 2 and 3 edge
– Group DLLs manually by functionality or software vendor
14/Sep/2011 NUS SoC CSTalks 33
37. DLL Dependency Graph: actual binary
usage
• Some definitions:
– An EXE-DLL dependency in a DLL Dependency Graph is
when there is has a control transfer from code in
executable x to code in DLL y. We say that x has an EXE-DLL
dependency on y.
– A DLL-DLL dependency in a DLL Dependency Graph is
when there is has a control transfer from code in DLL x to
code in DLL y. We say that x has a DLL-DLL dependency on
y
14/Sep/2011 NUS SoC CSTalks 37
42. Two Operations
• Diff
– Compare two graphs.
• E.g. from same program but different environment/input
• E.g. from two related programs
– Diff graph G1 and G2 to get G3.
• Projection
– Focus on a particular module X
– Only show modules that calls X or called by X
(recursive defination)
– Project graph G1 on module M to get G2
– Not a simple subgraph problem
14/Sep/2011 NUS SoC CSTalks 42
43. Diff of DLL dependency graph of Internet
Explorer with Flash and without
14/Sep/2011 NUS SoC CSTalks 43
44. Projection of the DLL dependency
graph of Internet Explorer on Flash
14/Sep/2011 NUS SoC CSTalks 44
47. Visualizing binaries executed
• Call graph is large.
• Group functions to images => DLL dependency
graph.
• DLL dependency graph is still large.
• Group DLLs by properties:
– By functionality: graphics, audio, network…
– By vendor: microsoft, adobe…
– By path: C:windowssystem32*.dll,
D:vmware*.dll…
14/Sep/2011 NUS SoC CSTalks 47
48. Visualizing binaries executed (1)
• Generate call tree, call graph, DLL dependency graph
• PIN tool to collect execution trace
– Trace include call, return, thread, context, system call
events
– Call and return records stack pointer, PC and target
address.
• Not trivial to maintain call stack by tracking call and
return
– Non-return function (long jump)
– Thread, fiber
– Context
– Kernel callback
14/Sep/2011 NUS SoC CSTalks 48
49. Projection
void main (void) { Full Graph
A();
A
B(1); C
} main
void A (void) { B
D
B(0);
}
void B (int i) {
if (i) D(); Project on A
else C(); A
} C
main
void C (void) {}
B
void D (void) {}
14/Sep/2011 NUS SoC CSTalks 49