SlideShare a Scribd company logo
1 of 61
Download to read offline
De-Anonymizing Live CDs through
   Physical Memory Analysis
            Andrew Case
       Senior Security Analyst
Speaker Background
• Computer Science degree from the University
  of New Orleans
• Former Security Consultant for Neohapsis
• Worked for Digital Forensics Solutions since
  2009
• Work experience ranges from penetration
  testing to reverse engineering to forensics
  investigations/IR to related research


                                                 2
Agenda
• Discuss Live CDs and how they disrupt the
  normal forensics process
• Present research that enables traditional
  investigative techniques against live CDs
• Discuss issues with Tor’s insecure handling of
  memory and present preliminary memory
  analysis research



                                                   3
Normal Forensics Process
        Obtain Hard Drive



        Acquire Disk Image



           Verify Image



          Process Image



       Perform Investigation


                               4
Traditional Analysis Techniques
• Timelining of activity based on MAC times
• Hashing of files
• Indexing and searching of files and
  unallocated space
• Recovery of deleted files
• Application specific analysis
  – Web activity from cache, history, and cookies
  – E-mail activity from local stores (PST, Mbox, …)

                                                       5
Problem of Live CDs
• Live CDs allow users to run an operating
  system and all applications entirely in RAM
• This makes traditional digital forensics
  (examination of disk images) impossible
• All the previously listed analysis techniques
  cannot be performed




                                                  6
The Problem Illustrated
       Obtain Hard Drive



       Acquire Disk Image



          Verify Image



         Process Image



      Perform Investigation


                              7
No Disks or Files, Now What?
• All we can obtain is a memory capture
• With this, an investigator is left with very
  limited and crude analysis techniques
• Can still search, but can’t map to files or dates
   – No context, hard to present coherently
• File carving becomes useless
   – Next slide
• Good luck in court


                                                      8
File Carving
• Used extensively to recover previously deleted
  files/data
• Uses a database of headers and footers to find
  files within raw byte streams such as a disk
  image
• Finds instances of each header followed by
  the footer
• Example file formats:
  – JPEG - xffxd8xffxe0x00x10 - xffxd9
  – GIF - x47x49x46x38x37x61 - x00x3b
                                                 9
File Carving Cont.
• File carving relies on contiguous allocation of
  files
  – Luckily modern filesystems strive for low
    fragmentation
• Unfortunately for memory analysis, physical
  pages for files are almost never allocated
  contigously
  – Page size is only 4k so no structured file will fit
  – Is the equivalent of a completely fragmented
    filesystem
                                                          10
People Have Caught On…
• The Amnesic Incognito Live System (TAILS) [1]
  – “No trace is left on local storage devices unless
    explicitly asked.”
  – “All outgoing connections to the Internet are
    forced to go through the Tor network”
• Backtrack [2]
  – “ability to perform assessments in a purely native
    environment dedicated to hacking.”



                                                        11
What It Really Means…
• Investigators without deep kernel internals
  knowledge and programming skill are basically
  hopeless
• It is well known that the use of live CDs is
  going to defeat most investigations
  – Main motivation for this work
  – Plenty anecdotal evidence of this can be found
    through Google searches


                                                     12
What is the Solution?
• Memory Analysis!
  • It is the only method we have available…
• This Analysis gives us:
   – The complete file system structure including
     file contents and metadata
   – Deleted Files (Maybe)
   – Userland process memory and file system
     information

                                                13
Goal 1: Recovering the File System
• Steps needed to achieve this goal:
   1. Understand the in-memory filesystem
   2. Develop an algorithm that can enumerate
      directory and files
   3. Recover metadata to enable timelining and
      other investigative techniques




                                              14
The In-Memory Filesystem
• AUFS (AnotherUnionFS)
  – http://aufs.sourceforge.net/
  – Used by TAILS, Backtrack, Ubuntu 10.04 installer,
    and a number of other Live CDs
  – Not included in the vanilla kernel, loaded as an
    external module




                                                        15
AUFS Internals
• Stackable filesystem
  • Presents a multilayer filesystem as a single one to users
  • This allows for files created after system boot to be
    transparently merged on top of read only CD
• Each layer is termed a branch
  • In the live CD case, one branch for the CD, and one for all
    other files made or changed since boot




                                                                  16
AUFS Userland View of TAILS
# cat /proc/mounts
                                               Mount
  aufs / aufs rw,relatime,si=4ef94245,noxino   points
                                               relevant
  /dev/loop0 /filesystem.squashfs squashfs     to AUFS

  tmpfs /live/cow tmpfs
  tmpfs /live tmpfs rw,relatime
                                               The
# cat /sys/fs/aufs/si_4ef94245/br0             mount
                                               point of
  /live/cow=rw                                 each
                                               AUFS
# cat /sys/fs/aufs/si_4ef94245/br1             branch

  /filesystem.squashfs=rr
                                                          17
Forensics Approach
• No real need to copy files from the read-only
  branch
  – Just image the CD
• On the other hand, the writable branch
  contains every file that was created or
  modified since boot
  – Including metadata
  – No deleted ones though, more on that later


                                                  18
Linux Internals Overview I
• struct dentry
  – Represents a directory entry (directory, file, …)
  – Contains the name of the directory entry and a
    pointer to its inode structure
• struct inode
  – FS generic, in-memory representation of a disk inode
  – Contains address_space structure that links an inode
    to its file’s pages
• struct address_space
  – Links physical pages together into something useful
  – Holds the search tree of pages for a file

                                                           19
Linux Internals Overview II
• Page Cache
  – Used to store struct page structures that
    correspond to physical pages
  – address_space structures contain linkage into the
    page cache that allows for ordered enumeration
    of all physical pages pertaining to an inode
• Tmpfs
  – In-memory filesystem
  – Used by TAILS to hold the writable branch


                                                        20
Enumerating Directories
• Once we can enumerate directories, we can
  recover the whole filesystem
• Not as simple as recursively walking the
  children of the file system’s root directory
• AUFS creates hidden dentrys and inodes in
  order to mask branches of the stacked
  filesystem
• Need to carefully interact between AUFS and
  tmpfs structures
                                                 21
Directory Enumeration Algorithm
1) Walk the super blocks list until the “aufs”
   filesystem is found
  •      This contains a pointer to the root dentry
2) For each child dentry, test if it represents a
   directory
   If the child is a directory:
        • Obtain the hidden directory entry (next slide)
        • Record metadata and recurse into directory
      If the child is a regular file:
        • Obtain the hidden inode and record metadata

                                                           22
Obtaining a Hidden Directory
• Each kernel dentry stores a pointer to an au_dinfo
structure inside its d_fsdata member

•The di_hdentry member of au_dinfo is an array of
au_hdentry structures that embed regular kernel
dentrys
   struct dentry   struct au_dinfo   Branch   Dentry
   {               {
      d_inode                        0        Pointer
                      au_hdentry
      d_name       }                 1        Pointer
      d_subdirs
      d_fsdata
   }


                                                        23
Obtaining Metadata
• All useful metadata such as MAC times, file
  size, file owner, etc is contained in the hidden
  inode
• This information is used to fill the stat
  command and istat functionality of the
  Sleuthkit
• Timelining becomes possible again



                                                 24
Obtaining a Hidden Inode
• Each aufs controlled inode gets embedded in an
aufs_icntnr

• This structure also embeds an array of au_hinode
structures which can be indexed by branch number to
find the hidden inode of an exposed inode

struct         struct au_iinfo   Branch   struct inode
aufs_icntnr    {                 0        Pointer
{                 ii_hinode
  iinfo                          1        Pointer
               }
  inode
}

                                                         25
Goal 2: Recovering File Contents
• The size of a file is kept in its inode’s i_size
  member
• An inode’s page_tree member is the root of
  the radix tree of its physical pages
• In order to recover file contents this tree
  needs to be searched for each page of a file
• The lookup function returns a struct page
  which leads to the backing physical page

                                                     26
Recovering File Contents Cont.
• Indexing the tree in order and gathering of
  each page will lead to accurate recovery of a
  whole file
• This algorithm assumes that swap isn’t being
  used
  – Using swap would defeat much of the purpose of
    anonymous live CDs
• Tmpfs analysis is useful for every distribution
  – Many distros mount /tmp using tmpfs, shmem,
    etc
                                                     27
Goal 3: Recovering Deleted Info
• Discussion:
  1. Formulate Approach
  2. Discuss the kmem_cache and how it relates
     to recovery
  3. Attempt to recover previously deleted file
     and directory names, metadata, and file
     contents



                                              28
Approach
• We want orderly recovery
• To accomplish this, information about deleted
  files and directories needs to be found in a
  non-standard way
  – All regular lists, hash tables, and so on lose track
    of structures as they are deleted
• Need a way to gather these structures in an
  orderly manner
  — kmem_cache analysis to the rescue!

                                                           29
Recovery though kmem_cache analysis
• A kmem_cache holds all structures of the
  same type in an organized manner
  – Allows for instant allocations & deallocations
  – Used for handling of process, memory mappings,
    open files, and many other structures
• Implementation controlled by allocator in use
  – SLAB and SLUB are the two main ones




                                                     30
kmem_cache Internals
• Both allocators keep track of allocated and
  previously de-allocated objects on three lists:
  – full, in which all objects are allocated
  – partial, a mix of allocated and de-allocated objects
  – free, previously freed objects*
• The free lists are cleared in an allocator
  dependent manner
  – SLAB leaves free lists in-tact for long periods of
    time
  – SLUB is more aggressive
                                                         31
kmem_cache Illustrated
• /proc/slabinfo contains information about
  each current kmem_cache
• Example output:
                                     The difference
 # name <active_objs> <num_objs>     between
 task_struct 101        154          num_objs and
                                     active_objs is
 mm_struct    76         99          how many free
                                     objects are
 filp        901      1420           being tracked by
                                     the kernel



                                                  32
Recovery Using kmem_cache Analysis
• Enumeration of the lists with free entries
  reveals previous objects still being tracked by
  the kernel
  – The kernel does not clear the memory of these
    objects
• Our previous work has demonstrated that
  much previously de-allocated, forensically
  interesting information can be leveraged from
  these caches [4]
                                                    33
Recovering Deleted Filesystem
             Structure
• Both Linux kernel and aufs directory entries
  are backed by the kmem_cache
• Recovery of these structures reveals names of
  previous files and directories
  – If d_parent member is still in-tact, can place
    entries within file system




                                                     34
Recovering Previous Metadata
• Inodes are also backed by the kmem_cache
• Recovery means we can timeline again
• Also, the dentry list of the AUFS inodes still
  have entries (strange)
  – This allows us to link inodes and dentrys together
  – Now we can reconstruct previously deleted file
    information with not only file names & paths, but
    also MAC times, sizes, inode numbers, and more



                                                         35
Recovering File Contents – Bad News
• Again, inodes are kept in the kmem_cache
• Unfortunately, page cache entries are
  removed upon deallocation, making lookup
  impossible
  – A large number of pointers would need to stay in-
    tact for this to work
• This removes the ability to recover file
  contents in an orderly manner
• Other ways may be possible, but will require
  more research
                                                    36
Summary of File System Analysis
• Can completely recover the in-memory
  filesystem, its associated metadata, and all file
  contents
• Ordered, partial recovery of deleted file
  names and their metadata is also possible
• Traditional forensics techniques can be made
  possible against live CDs
  – Making such analysis accessible to all investigators

                                                       37
Implementation
• Recovery code was originally written as
  loadable kernel modules
  – Allowed for rapid development and testing of
    ideas
  – 2nd implementation was developed for Volatility
• Vmware workstation snapshots were used to
  avoid rebooting of the live CD and
  reinstallation of software
  – TAILs doesn’t include development tools/headers
  – This saved days of research time
                                                      38
Testing
• Output was compared to known data sets
  – Directories and files with scripted contents
  – Metadata was compared to the stat command
  – File contents were compared to scripted contents
• Deleted information was analyzed through
  previously allocated structures
  – While a file was still allocated, its dentry, inode, etc
    pointers were saved
  – File was deleted and these addresses were
    examined for previous data
                                                          39
Memory Analysis of Tor




                         40
Tor Overview
• Used by millions of people worldwide to
  perform anonymous Internet communications
• Anonymity of communications is essential to
  whistleblowers, journalists from nations
  without freedom of the press, and to a
  number of other professions
• Any recovery of Tor related activity can have
  dire consequences for such people


                                              41
One Slide Technical Overview
• Tor encrypts and sends traffic from clients to a
  number of other hosts before being sent to
  the recipient destination
• Only the final Tor endpoint can decrypt the
  actual packet contents
  – All others can only decrypt necessary routing
    information
• The endpoint used is changed at regular
  intervals to ensure that a compromise does
  not remove all anonymity
                                                    42
Tor Analysis Motivation
• Forensics/IR Perspective
  – TAILS and a number of other live CDs use Tor to
    avoid network forensics
  – Not being able to obtain or reconstruct traffic can
    make certain investigation scenarios impossible
  – If memory analysis can reveal useful evidence
    then the inability to perform network analysis is
    not as painful



                                                          43
Tor Analysis Motivation
• Privacy Perspective
  – Tor provides an extremely useful platform to
    perform anonymous communications
  – To ensure that communications are indeed secure,
    memory analysis needs to be performed on all
    systems that process unencrypted data




                                                   44
Analyzing Memory Activity of Tor
• Analysis reveals that Tor does not always
  securely erase memory after its used
• Sound Familiar?
• Since we have access to the process memory
  of Tor we should be able to recover data of
  interest….
  – Papers discussing how to recover userland process
    memory are referenced in the white paper



                                                    45
Initial Setup & Analysis
• Privoxy is a Tor-aware HTTP proxy
• Tor was installed along with Privoxy on the
  test virtual machine
• wget was then configured to use Privoxy
  which would relay the information to Tor
• Before digging into source code, performed
  the Poor Man’s Test (next slide)



                                                46
The Poor Man’s Test
1. Used wget to recursively download
   digitalforensicssolutions.com
2. Verified Tor network connections closed
3. Used memfetch [3] to dump the heap of the
   tor process
4. Ran strings on heap file
5. # grep -c digitalforensics strings-output
    7
            Looking good so far….
                                               47
Initial Analysis Results
• Analysis revealed that HTTP headers,
  downloaded page contents, server
  information, and more were contained in its
  memory
• It seemed that the last used HTTP header was
  kept in memory
  – Possibly a single buffer used for this?
  – Numerous instances were found for the other
    types of data


                                                  48
Interesting Output from Strings
1) HTTP REQUEST
GET /incidence-response.html HTTP/1.0
Referer: http://www.digitalforensicssolutions.com/
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: www.digitalforensicssolutions.com

2) HTML fragments from downloaded webpage
<h2>Evidence Preservation</h2>
<p>Our evidence preservation methodology provides an exact
   copy of any digital evidence and ensures that the authenticity
   and integrity of both the duplicate copy and the original data
   source is preserved.</p>
<h2>Evidence Custody</h2>
                                                                    49
Digging Deeper into Tor
• After seeing the previous results, source code
  analysis was performed
• Again, orderly collection of data is our goal
• Much more analysis is possible than what was
  covered in this initial analysis
• Still on-going research…




                                               50
Developed Analysis Scripts
• Two Python scripts were developed that pull
  information from a Tor process
  – The first enumerates and obtains the Tor freelist
  – The second enumerates Tor cells




                                                        51
Script 1 - Walking Tor’s freelist
• Tor keeps “chunks” in its global freelist in
  order to provide fast allocation of new
  memory
  – Very similar to the workings of the kmem_cache
  – The script enumerates the freelist array and
    dumps all memory contained




                                                     52
Freelist Structure
typedef struct chunk_freelist_t {
                                         freelist is an
   size_t alloc_size; // size of chunk   instance of this
   int cur_length; // number on list     structure

   chunk_t *head;
}
typedef struct chunk_t {
  struct chunk_t *next;                  Each chunk is
  size_t datalen;                        represented
                                         by a chunk_t
  char *data;
} chunk_t;
                                                         53
Script 2- Tor’s Cell Pool Cache
• In Tor, all data is sent and received as a packed
  cell
• cell_pool is a memory pool that holds cells
  allocated and deallocated by Tor
  – Unless the pool is cleaned
• Walking of this pool enumerates every cell
  structure including its contents (payload)
• Unfortunately the payloads are encrypted

                                                  54
Cell Pool Structures & Enumeration
• cell_pool is of type mp_pool_t
• The recovery script walks the three mp_chunk_t lists
  as well as the doubly linked list contained in each
  mp_chunk_t
• This leads to the type-agnostic mem buffer of each
  chunk
struct mp_pool_t {                  struct mp_chunk_t {
   struct mp_chunk_t *empty_chunks,    mp_chunk_t *next;
   *used_chunks, *full_chunks;         mp_chunk_t *prev;
   size_t item_alloc_size; }           size_t mem_size;
                                       char mem[1]; }

                                                           55
Recovery of Packed Cells
• mp_chunk_t structures hold type-agnostic
  data
• In the cell pool these are represented by a:
  typedef struct packed_cell_t {
     struct packed_cell_t *next;
     char body[CELL_NETWORK_SIZE];
  } packed_cell_t;
• Walking the next list retrieves reachable
  packed cells
                                                 56
Conclusion
• Memory Analysis of Live CDs is no longer
  difficult
• Use of the presented research enables
  traditional forensics techniques to be used
• As if we didn’t know already, applications are
  really bad about handling of sensitive data in
  memory



                                                   57
Future Work – Live CD Filesystems
• Integrate analysis code into Volatility
• Test against more Live CDs / aufs
  configurations
  – aufs has a number of configuration options
• Look into stackable filesystems used by other
  Live CDs
  – Unionfs is a good target (used by Debian, Gentoo,
    etc)


                                                        58
Future Work - Tor
• Work on recovery of encrypted Tor cells
   – Need to find the encrypted key, match to
   packed cell, and then decrypt the payload
   section
• Tor developers are aware of the memory
  handling issues, response will determine
  amount of further work possible



                                                59
Comments? Questions?
• Full details of work are in our whitepaper
• Contact: andrew@digdeeply.com




                                               60
References
[1] https://amnesia.boum.org/
[2] http://www.backtrack-linux.org
[3] lcamtuf.coredump.cx/soft/memfetch.tgz
[4] A. Case, et al, "Treasure and Tragedy in kmem_cache Mining
   for Live Forensics Investigation," Proceedings of the 10th
   Annual Digital Forensics Research Workshop (DFRWS 2010),
   Portland, OR, 2010.




                                                                 61

More Related Content

What's hot

Memory forensics
Memory forensicsMemory forensics
Memory forensics
Sunil Kumar
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshop
Tamas K Lengyel
 
Malware analysis using volatility
Malware analysis using volatilityMalware analysis using volatility
Malware analysis using volatility
Yashashree Gund
 
Windows Memory Forensic Analysis using EnCase
Windows Memory Forensic Analysis using EnCaseWindows Memory Forensic Analysis using EnCase
Windows Memory Forensic Analysis using EnCase
Takahiro Haruyama
 
Mac Forensics
Mac ForensicsMac Forensics
Mac Forensics
CTIN
 
Android Mind Reading: Android Live Memory Analysis with LiME and Volatility
Android Mind Reading: Android Live Memory Analysis with LiME and VolatilityAndroid Mind Reading: Android Live Memory Analysis with LiME and Volatility
Android Mind Reading: Android Live Memory Analysis with LiME and Volatility
Joe Sylve
 

What's hot (20)

Memory forensics
Memory forensicsMemory forensics
Memory forensics
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshop
 
Malware analysis using volatility
Malware analysis using volatilityMalware analysis using volatility
Malware analysis using volatility
 
Windows Memory Forensic Analysis using EnCase
Windows Memory Forensic Analysis using EnCaseWindows Memory Forensic Analysis using EnCase
Windows Memory Forensic Analysis using EnCase
 
Basic Linux Internals
Basic Linux InternalsBasic Linux Internals
Basic Linux Internals
 
Linux Char Device Driver
Linux Char Device DriverLinux Char Device Driver
Linux Char Device Driver
 
Unmasking Careto through Memory Forensics (video in description)
Unmasking Careto through Memory Forensics (video in description)Unmasking Careto through Memory Forensics (video in description)
Unmasking Careto through Memory Forensics (video in description)
 
Memory Analysis of the Dalvik (Android) Virtual Machine
Memory Analysis of the Dalvik (Android) Virtual MachineMemory Analysis of the Dalvik (Android) Virtual Machine
Memory Analysis of the Dalvik (Android) Virtual Machine
 
Introduction to UNIX
Introduction to UNIXIntroduction to UNIX
Introduction to UNIX
 
Linux Kernel I/O Schedulers
Linux Kernel I/O SchedulersLinux Kernel I/O Schedulers
Linux Kernel I/O Schedulers
 
Linux introduction
Linux introductionLinux introduction
Linux introduction
 
Linux Kernel Development
Linux Kernel DevelopmentLinux Kernel Development
Linux Kernel Development
 
Jaime Peñalba - Kernel exploitation. ¿El octavo arte? [rooted2019]
Jaime Peñalba - Kernel exploitation. ¿El octavo arte? [rooted2019]Jaime Peñalba - Kernel exploitation. ¿El octavo arte? [rooted2019]
Jaime Peñalba - Kernel exploitation. ¿El octavo arte? [rooted2019]
 
2010 2013 sandro suffert memory forensics introdutory work shop - public
2010 2013 sandro suffert memory forensics introdutory work shop - public2010 2013 sandro suffert memory forensics introdutory work shop - public
2010 2013 sandro suffert memory forensics introdutory work shop - public
 
Basic Linux kernel
Basic Linux kernelBasic Linux kernel
Basic Linux kernel
 
Lecture2 process structure and programming
Lecture2   process structure and programmingLecture2   process structure and programming
Lecture2 process structure and programming
 
Lecture 5 Kernel Development
Lecture 5 Kernel DevelopmentLecture 5 Kernel Development
Lecture 5 Kernel Development
 
Mac Forensics
Mac ForensicsMac Forensics
Mac Forensics
 
Lecture1 Introduction
Lecture1  IntroductionLecture1  Introduction
Lecture1 Introduction
 
Android Mind Reading: Android Live Memory Analysis with LiME and Volatility
Android Mind Reading: Android Live Memory Analysis with LiME and VolatilityAndroid Mind Reading: Android Live Memory Analysis with LiME and Volatility
Android Mind Reading: Android Live Memory Analysis with LiME and Volatility
 

Similar to De-Anonymizing Live CDs through Physical Memory Analysis

AntiForensics - Leveraging OS and File System Artifacts.pdf
AntiForensics - Leveraging OS and File System Artifacts.pdfAntiForensics - Leveraging OS and File System Artifacts.pdf
AntiForensics - Leveraging OS and File System Artifacts.pdf
ekobelasting
 
Lecture 8 comp forensics 03 10-18 file system
Lecture 8 comp forensics 03 10-18 file systemLecture 8 comp forensics 03 10-18 file system
Lecture 8 comp forensics 03 10-18 file system
Alchemist095
 

Similar to De-Anonymizing Live CDs through Physical Memory Analysis (20)

Windowsforensics
WindowsforensicsWindowsforensics
Windowsforensics
 
UNIT III.pptx
UNIT III.pptxUNIT III.pptx
UNIT III.pptx
 
Unix File System
Unix File SystemUnix File System
Unix File System
 
9781111306366 ppt ch11
9781111306366 ppt ch119781111306366 ppt ch11
9781111306366 ppt ch11
 
Poking The Filesystem For Fun And Profit
Poking The Filesystem For Fun And ProfitPoking The Filesystem For Fun And Profit
Poking The Filesystem For Fun And Profit
 
5263802.ppt
5263802.ppt5263802.ppt
5263802.ppt
 
File system discovery
File system discovery File system discovery
File system discovery
 
File Management & Access Control
File Management & Access Control File Management & Access Control
File Management & Access Control
 
AntiForensics - Leveraging OS and File System Artifacts.pdf
AntiForensics - Leveraging OS and File System Artifacts.pdfAntiForensics - Leveraging OS and File System Artifacts.pdf
AntiForensics - Leveraging OS and File System Artifacts.pdf
 
File carving overview
File carving overviewFile carving overview
File carving overview
 
osd - co1 session7.pptx
osd - co1 session7.pptxosd - co1 session7.pptx
osd - co1 session7.pptx
 
2nd unit part 1
2nd unit  part 12nd unit  part 1
2nd unit part 1
 
Lecture 8 comp forensics 03 10-18 file system
Lecture 8 comp forensics 03 10-18 file systemLecture 8 comp forensics 03 10-18 file system
Lecture 8 comp forensics 03 10-18 file system
 
CNIT 121: 13 Investigating Mac OS X Systems
CNIT 121: 13 Investigating Mac OS X SystemsCNIT 121: 13 Investigating Mac OS X Systems
CNIT 121: 13 Investigating Mac OS X Systems
 
Systems Programming - File IO
Systems Programming - File IOSystems Programming - File IO
Systems Programming - File IO
 
File system and Deadlocks
File system and DeadlocksFile system and Deadlocks
File system and Deadlocks
 
CNIT 152 13 Investigating Mac OS X Systems
CNIT 152 13 Investigating Mac OS X SystemsCNIT 152 13 Investigating Mac OS X Systems
CNIT 152 13 Investigating Mac OS X Systems
 
SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012
SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012
SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012
 
Accessioning Born-Digital Materials
Accessioning Born-Digital MaterialsAccessioning Born-Digital Materials
Accessioning Born-Digital Materials
 
Os6
Os6Os6
Os6
 

More from Andrew Case

More from Andrew Case (6)

Proactive Measures to Defeat Insider Threat
Proactive Measures to Defeat Insider ThreatProactive Measures to Defeat Insider Threat
Proactive Measures to Defeat Insider Threat
 
OMFW 2012: Analyzing Linux Kernel Rootkits with Volatlity
OMFW 2012: Analyzing Linux Kernel Rootkits with VolatlityOMFW 2012: Analyzing Linux Kernel Rootkits with Volatlity
OMFW 2012: Analyzing Linux Kernel Rootkits with Volatlity
 
Memory Forensics: Defeating Disk Encryption, Skilled Attackers, and Advanced ...
Memory Forensics: Defeating Disk Encryption, Skilled Attackers, and Advanced ...Memory Forensics: Defeating Disk Encryption, Skilled Attackers, and Advanced ...
Memory Forensics: Defeating Disk Encryption, Skilled Attackers, and Advanced ...
 
Hunting Mac Malware with Memory Forensics
Hunting Mac Malware with Memory ForensicsHunting Mac Malware with Memory Forensics
Hunting Mac Malware with Memory Forensics
 
My Keynote from BSidesTampa 2015 (video in description)
My Keynote from BSidesTampa 2015 (video in description)My Keynote from BSidesTampa 2015 (video in description)
My Keynote from BSidesTampa 2015 (video in description)
 
Investigating Cooridinated Data Exfiltration
Investigating Cooridinated Data ExfiltrationInvestigating Cooridinated Data Exfiltration
Investigating Cooridinated Data Exfiltration
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

De-Anonymizing Live CDs through Physical Memory Analysis

  • 1. De-Anonymizing Live CDs through Physical Memory Analysis Andrew Case Senior Security Analyst
  • 2. Speaker Background • Computer Science degree from the University of New Orleans • Former Security Consultant for Neohapsis • Worked for Digital Forensics Solutions since 2009 • Work experience ranges from penetration testing to reverse engineering to forensics investigations/IR to related research 2
  • 3. Agenda • Discuss Live CDs and how they disrupt the normal forensics process • Present research that enables traditional investigative techniques against live CDs • Discuss issues with Tor’s insecure handling of memory and present preliminary memory analysis research 3
  • 4. Normal Forensics Process Obtain Hard Drive Acquire Disk Image Verify Image Process Image Perform Investigation 4
  • 5. Traditional Analysis Techniques • Timelining of activity based on MAC times • Hashing of files • Indexing and searching of files and unallocated space • Recovery of deleted files • Application specific analysis – Web activity from cache, history, and cookies – E-mail activity from local stores (PST, Mbox, …) 5
  • 6. Problem of Live CDs • Live CDs allow users to run an operating system and all applications entirely in RAM • This makes traditional digital forensics (examination of disk images) impossible • All the previously listed analysis techniques cannot be performed 6
  • 7. The Problem Illustrated Obtain Hard Drive Acquire Disk Image Verify Image Process Image Perform Investigation 7
  • 8. No Disks or Files, Now What? • All we can obtain is a memory capture • With this, an investigator is left with very limited and crude analysis techniques • Can still search, but can’t map to files or dates – No context, hard to present coherently • File carving becomes useless – Next slide • Good luck in court 8
  • 9. File Carving • Used extensively to recover previously deleted files/data • Uses a database of headers and footers to find files within raw byte streams such as a disk image • Finds instances of each header followed by the footer • Example file formats: – JPEG - xffxd8xffxe0x00x10 - xffxd9 – GIF - x47x49x46x38x37x61 - x00x3b 9
  • 10. File Carving Cont. • File carving relies on contiguous allocation of files – Luckily modern filesystems strive for low fragmentation • Unfortunately for memory analysis, physical pages for files are almost never allocated contigously – Page size is only 4k so no structured file will fit – Is the equivalent of a completely fragmented filesystem 10
  • 11. People Have Caught On… • The Amnesic Incognito Live System (TAILS) [1] – “No trace is left on local storage devices unless explicitly asked.” – “All outgoing connections to the Internet are forced to go through the Tor network” • Backtrack [2] – “ability to perform assessments in a purely native environment dedicated to hacking.” 11
  • 12. What It Really Means… • Investigators without deep kernel internals knowledge and programming skill are basically hopeless • It is well known that the use of live CDs is going to defeat most investigations – Main motivation for this work – Plenty anecdotal evidence of this can be found through Google searches 12
  • 13. What is the Solution? • Memory Analysis! • It is the only method we have available… • This Analysis gives us: – The complete file system structure including file contents and metadata – Deleted Files (Maybe) – Userland process memory and file system information 13
  • 14. Goal 1: Recovering the File System • Steps needed to achieve this goal: 1. Understand the in-memory filesystem 2. Develop an algorithm that can enumerate directory and files 3. Recover metadata to enable timelining and other investigative techniques 14
  • 15. The In-Memory Filesystem • AUFS (AnotherUnionFS) – http://aufs.sourceforge.net/ – Used by TAILS, Backtrack, Ubuntu 10.04 installer, and a number of other Live CDs – Not included in the vanilla kernel, loaded as an external module 15
  • 16. AUFS Internals • Stackable filesystem • Presents a multilayer filesystem as a single one to users • This allows for files created after system boot to be transparently merged on top of read only CD • Each layer is termed a branch • In the live CD case, one branch for the CD, and one for all other files made or changed since boot 16
  • 17. AUFS Userland View of TAILS # cat /proc/mounts Mount aufs / aufs rw,relatime,si=4ef94245,noxino points relevant /dev/loop0 /filesystem.squashfs squashfs to AUFS tmpfs /live/cow tmpfs tmpfs /live tmpfs rw,relatime The # cat /sys/fs/aufs/si_4ef94245/br0 mount point of /live/cow=rw each AUFS # cat /sys/fs/aufs/si_4ef94245/br1 branch /filesystem.squashfs=rr 17
  • 18. Forensics Approach • No real need to copy files from the read-only branch – Just image the CD • On the other hand, the writable branch contains every file that was created or modified since boot – Including metadata – No deleted ones though, more on that later 18
  • 19. Linux Internals Overview I • struct dentry – Represents a directory entry (directory, file, …) – Contains the name of the directory entry and a pointer to its inode structure • struct inode – FS generic, in-memory representation of a disk inode – Contains address_space structure that links an inode to its file’s pages • struct address_space – Links physical pages together into something useful – Holds the search tree of pages for a file 19
  • 20. Linux Internals Overview II • Page Cache – Used to store struct page structures that correspond to physical pages – address_space structures contain linkage into the page cache that allows for ordered enumeration of all physical pages pertaining to an inode • Tmpfs – In-memory filesystem – Used by TAILS to hold the writable branch 20
  • 21. Enumerating Directories • Once we can enumerate directories, we can recover the whole filesystem • Not as simple as recursively walking the children of the file system’s root directory • AUFS creates hidden dentrys and inodes in order to mask branches of the stacked filesystem • Need to carefully interact between AUFS and tmpfs structures 21
  • 22. Directory Enumeration Algorithm 1) Walk the super blocks list until the “aufs” filesystem is found • This contains a pointer to the root dentry 2) For each child dentry, test if it represents a directory If the child is a directory: • Obtain the hidden directory entry (next slide) • Record metadata and recurse into directory If the child is a regular file: • Obtain the hidden inode and record metadata 22
  • 23. Obtaining a Hidden Directory • Each kernel dentry stores a pointer to an au_dinfo structure inside its d_fsdata member •The di_hdentry member of au_dinfo is an array of au_hdentry structures that embed regular kernel dentrys struct dentry struct au_dinfo Branch Dentry { { d_inode 0 Pointer au_hdentry d_name } 1 Pointer d_subdirs d_fsdata } 23
  • 24. Obtaining Metadata • All useful metadata such as MAC times, file size, file owner, etc is contained in the hidden inode • This information is used to fill the stat command and istat functionality of the Sleuthkit • Timelining becomes possible again 24
  • 25. Obtaining a Hidden Inode • Each aufs controlled inode gets embedded in an aufs_icntnr • This structure also embeds an array of au_hinode structures which can be indexed by branch number to find the hidden inode of an exposed inode struct struct au_iinfo Branch struct inode aufs_icntnr { 0 Pointer { ii_hinode iinfo 1 Pointer } inode } 25
  • 26. Goal 2: Recovering File Contents • The size of a file is kept in its inode’s i_size member • An inode’s page_tree member is the root of the radix tree of its physical pages • In order to recover file contents this tree needs to be searched for each page of a file • The lookup function returns a struct page which leads to the backing physical page 26
  • 27. Recovering File Contents Cont. • Indexing the tree in order and gathering of each page will lead to accurate recovery of a whole file • This algorithm assumes that swap isn’t being used – Using swap would defeat much of the purpose of anonymous live CDs • Tmpfs analysis is useful for every distribution – Many distros mount /tmp using tmpfs, shmem, etc 27
  • 28. Goal 3: Recovering Deleted Info • Discussion: 1. Formulate Approach 2. Discuss the kmem_cache and how it relates to recovery 3. Attempt to recover previously deleted file and directory names, metadata, and file contents 28
  • 29. Approach • We want orderly recovery • To accomplish this, information about deleted files and directories needs to be found in a non-standard way – All regular lists, hash tables, and so on lose track of structures as they are deleted • Need a way to gather these structures in an orderly manner — kmem_cache analysis to the rescue! 29
  • 30. Recovery though kmem_cache analysis • A kmem_cache holds all structures of the same type in an organized manner – Allows for instant allocations & deallocations – Used for handling of process, memory mappings, open files, and many other structures • Implementation controlled by allocator in use – SLAB and SLUB are the two main ones 30
  • 31. kmem_cache Internals • Both allocators keep track of allocated and previously de-allocated objects on three lists: – full, in which all objects are allocated – partial, a mix of allocated and de-allocated objects – free, previously freed objects* • The free lists are cleared in an allocator dependent manner – SLAB leaves free lists in-tact for long periods of time – SLUB is more aggressive 31
  • 32. kmem_cache Illustrated • /proc/slabinfo contains information about each current kmem_cache • Example output: The difference # name <active_objs> <num_objs> between task_struct 101 154 num_objs and active_objs is mm_struct 76 99 how many free objects are filp 901 1420 being tracked by the kernel 32
  • 33. Recovery Using kmem_cache Analysis • Enumeration of the lists with free entries reveals previous objects still being tracked by the kernel – The kernel does not clear the memory of these objects • Our previous work has demonstrated that much previously de-allocated, forensically interesting information can be leveraged from these caches [4] 33
  • 34. Recovering Deleted Filesystem Structure • Both Linux kernel and aufs directory entries are backed by the kmem_cache • Recovery of these structures reveals names of previous files and directories – If d_parent member is still in-tact, can place entries within file system 34
  • 35. Recovering Previous Metadata • Inodes are also backed by the kmem_cache • Recovery means we can timeline again • Also, the dentry list of the AUFS inodes still have entries (strange) – This allows us to link inodes and dentrys together – Now we can reconstruct previously deleted file information with not only file names & paths, but also MAC times, sizes, inode numbers, and more 35
  • 36. Recovering File Contents – Bad News • Again, inodes are kept in the kmem_cache • Unfortunately, page cache entries are removed upon deallocation, making lookup impossible – A large number of pointers would need to stay in- tact for this to work • This removes the ability to recover file contents in an orderly manner • Other ways may be possible, but will require more research 36
  • 37. Summary of File System Analysis • Can completely recover the in-memory filesystem, its associated metadata, and all file contents • Ordered, partial recovery of deleted file names and their metadata is also possible • Traditional forensics techniques can be made possible against live CDs – Making such analysis accessible to all investigators 37
  • 38. Implementation • Recovery code was originally written as loadable kernel modules – Allowed for rapid development and testing of ideas – 2nd implementation was developed for Volatility • Vmware workstation snapshots were used to avoid rebooting of the live CD and reinstallation of software – TAILs doesn’t include development tools/headers – This saved days of research time 38
  • 39. Testing • Output was compared to known data sets – Directories and files with scripted contents – Metadata was compared to the stat command – File contents were compared to scripted contents • Deleted information was analyzed through previously allocated structures – While a file was still allocated, its dentry, inode, etc pointers were saved – File was deleted and these addresses were examined for previous data 39
  • 41. Tor Overview • Used by millions of people worldwide to perform anonymous Internet communications • Anonymity of communications is essential to whistleblowers, journalists from nations without freedom of the press, and to a number of other professions • Any recovery of Tor related activity can have dire consequences for such people 41
  • 42. One Slide Technical Overview • Tor encrypts and sends traffic from clients to a number of other hosts before being sent to the recipient destination • Only the final Tor endpoint can decrypt the actual packet contents – All others can only decrypt necessary routing information • The endpoint used is changed at regular intervals to ensure that a compromise does not remove all anonymity 42
  • 43. Tor Analysis Motivation • Forensics/IR Perspective – TAILS and a number of other live CDs use Tor to avoid network forensics – Not being able to obtain or reconstruct traffic can make certain investigation scenarios impossible – If memory analysis can reveal useful evidence then the inability to perform network analysis is not as painful 43
  • 44. Tor Analysis Motivation • Privacy Perspective – Tor provides an extremely useful platform to perform anonymous communications – To ensure that communications are indeed secure, memory analysis needs to be performed on all systems that process unencrypted data 44
  • 45. Analyzing Memory Activity of Tor • Analysis reveals that Tor does not always securely erase memory after its used • Sound Familiar? • Since we have access to the process memory of Tor we should be able to recover data of interest…. – Papers discussing how to recover userland process memory are referenced in the white paper 45
  • 46. Initial Setup & Analysis • Privoxy is a Tor-aware HTTP proxy • Tor was installed along with Privoxy on the test virtual machine • wget was then configured to use Privoxy which would relay the information to Tor • Before digging into source code, performed the Poor Man’s Test (next slide) 46
  • 47. The Poor Man’s Test 1. Used wget to recursively download digitalforensicssolutions.com 2. Verified Tor network connections closed 3. Used memfetch [3] to dump the heap of the tor process 4. Ran strings on heap file 5. # grep -c digitalforensics strings-output 7 Looking good so far…. 47
  • 48. Initial Analysis Results • Analysis revealed that HTTP headers, downloaded page contents, server information, and more were contained in its memory • It seemed that the last used HTTP header was kept in memory – Possibly a single buffer used for this? – Numerous instances were found for the other types of data 48
  • 49. Interesting Output from Strings 1) HTTP REQUEST GET /incidence-response.html HTTP/1.0 Referer: http://www.digitalforensicssolutions.com/ User-Agent: Wget/1.12 (linux-gnu) Accept: */* Host: www.digitalforensicssolutions.com 2) HTML fragments from downloaded webpage <h2>Evidence Preservation</h2> <p>Our evidence preservation methodology provides an exact copy of any digital evidence and ensures that the authenticity and integrity of both the duplicate copy and the original data source is preserved.</p> <h2>Evidence Custody</h2> 49
  • 50. Digging Deeper into Tor • After seeing the previous results, source code analysis was performed • Again, orderly collection of data is our goal • Much more analysis is possible than what was covered in this initial analysis • Still on-going research… 50
  • 51. Developed Analysis Scripts • Two Python scripts were developed that pull information from a Tor process – The first enumerates and obtains the Tor freelist – The second enumerates Tor cells 51
  • 52. Script 1 - Walking Tor’s freelist • Tor keeps “chunks” in its global freelist in order to provide fast allocation of new memory – Very similar to the workings of the kmem_cache – The script enumerates the freelist array and dumps all memory contained 52
  • 53. Freelist Structure typedef struct chunk_freelist_t { freelist is an size_t alloc_size; // size of chunk instance of this int cur_length; // number on list structure chunk_t *head; } typedef struct chunk_t { struct chunk_t *next; Each chunk is size_t datalen; represented by a chunk_t char *data; } chunk_t; 53
  • 54. Script 2- Tor’s Cell Pool Cache • In Tor, all data is sent and received as a packed cell • cell_pool is a memory pool that holds cells allocated and deallocated by Tor – Unless the pool is cleaned • Walking of this pool enumerates every cell structure including its contents (payload) • Unfortunately the payloads are encrypted 54
  • 55. Cell Pool Structures & Enumeration • cell_pool is of type mp_pool_t • The recovery script walks the three mp_chunk_t lists as well as the doubly linked list contained in each mp_chunk_t • This leads to the type-agnostic mem buffer of each chunk struct mp_pool_t { struct mp_chunk_t { struct mp_chunk_t *empty_chunks, mp_chunk_t *next; *used_chunks, *full_chunks; mp_chunk_t *prev; size_t item_alloc_size; } size_t mem_size; char mem[1]; } 55
  • 56. Recovery of Packed Cells • mp_chunk_t structures hold type-agnostic data • In the cell pool these are represented by a: typedef struct packed_cell_t { struct packed_cell_t *next; char body[CELL_NETWORK_SIZE]; } packed_cell_t; • Walking the next list retrieves reachable packed cells 56
  • 57. Conclusion • Memory Analysis of Live CDs is no longer difficult • Use of the presented research enables traditional forensics techniques to be used • As if we didn’t know already, applications are really bad about handling of sensitive data in memory 57
  • 58. Future Work – Live CD Filesystems • Integrate analysis code into Volatility • Test against more Live CDs / aufs configurations – aufs has a number of configuration options • Look into stackable filesystems used by other Live CDs – Unionfs is a good target (used by Debian, Gentoo, etc) 58
  • 59. Future Work - Tor • Work on recovery of encrypted Tor cells – Need to find the encrypted key, match to packed cell, and then decrypt the payload section • Tor developers are aware of the memory handling issues, response will determine amount of further work possible 59
  • 60. Comments? Questions? • Full details of work are in our whitepaper • Contact: andrew@digdeeply.com 60
  • 61. References [1] https://amnesia.boum.org/ [2] http://www.backtrack-linux.org [3] lcamtuf.coredump.cx/soft/memfetch.tgz [4] A. Case, et al, "Treasure and Tragedy in kmem_cache Mining for Live Forensics Investigation," Proceedings of the 10th Annual Digital Forensics Research Workshop (DFRWS 2010), Portland, OR, 2010. 61