In this talk, Gil Yankovitch discusses the PaX patch for the Linux kernel, focusing on memory manager changes and security mechanisms for memory allocations, reads, writes from user/kernel space and ASLR.
2. Prerequisites
● Knowledge with kernel internals (duh!)
● Knowledge of the Memory Management model
● Slab Memory allocation
...
○ But I’ll cover anything anyway...
3. Key Points
● What is grsecurity and PaX
● How to apply it
○ Configuration variations
● PaX Memory Manager features
○ PAX_USERCOPY
○ PAX_MEMORY_SANITIZE
○ PAX_ASLR
■ PAX_RANDSTACK
4. Linus Torvalds
● Linus is not a big fan of security...
“One reason I refuse to bother with the whole security circus is that I think it glorifies - and
thus encourages - the wrong behavior. It makes "heroes" out of security people, as if the people
who don't just fix normal bugs aren't as important.”
5. grsecurity - Technical details
● PaX is a set of features within the grsecurity (grsecurity.net) patch
● Linux Kernel patch
● Stable for versions 3.2.72 and 3.14.54
● Test for 4.3.5
● Was free until 08/2015
○ Stopped being free because of copyright infringement
● Supports all major architectures
○ x86/x64, arm, ppc, mips
6. Applying the patch
● Very hard.
● Doesn’t always work.
● Needs to use specific commands depending your machine.
● But some say this works:
$ patch -p1 < grsecurity-3.1-4.3.5-201602032209.patch
● So now, well, you don’t have any excuse not to use it.
● Now grsecurity is applied to your kernel.
● Even just by doing this we improve our system’s security. You’ll see...
7. Activating specific features
● grsecurity supports automatic configuration
● Automatic configuration depends on choosing between two factors:
○ Performance
○ Security
● They usually don’t go hand in hand...
$ make menuconfig
Security options --->
Grsecurity --->
11. copy_from_user
● Not many use this function
● Eventually leads to this piece of code:
static __always_inline __must_check
unsigned long __copy_from_user_nocheck(void *dst, const void __user *src, unsigned long size)
{
size_t sz = __compiletime_object_size(dst);
unsigned ret = 0;
if (size > INT_MAX)
return size;
check_object_size(dst, size, false);
arch/x86/include/asm/uaccess_64.h
● Lets focus on check_object_size()
12. check_object_size()
void __check_object_size(const void *ptr, unsigned long n, bool to_user, bool const_size)
{
#ifdef CONFIG_PAX_USERCOPY
const char *type;
#endif
...
#ifdef CONFIG_PAX_USERCOPY
if (!n)
return;
type = check_heap_object(ptr, n);
if (!type) {
int ret = check_stack_object(ptr, n);
if (ret == 1 || ret == 2)
return;
if (ret == 0) {
if (check_kernel_text_object((unsigned long)ptr, (unsigned long)ptr + n))
type = "";
else
return;
} else
type = "";
}
pax_report_usercopy(ptr, n, to_user, type);
#endif
fs/exec.c
13. check_heap_object()
const char *check_heap_object(const void *ptr, unsigned long n)
{
struct page *page;
struct kmem_cache *s;
unsigned long offset;
if (ZERO_OR_NULL_PTR(ptr))
return "<null>";
if (!virt_addr_valid(ptr))
return NULL;
page = virt_to_head_page(ptr);
if (!PageSlab(page))
return NULL;
s = page->slab_cache;
if (!(s->flags & SLAB_USERCOPY))
return s->name;
offset = (ptr - page_address(page)) % s->size;
if (offset <= s->object_size && n <= s->object_size - offset)
return NULL;
return s->name;
}
mm/slub.c
14. check_stack_object()
● First, few basic checks
#ifdef CONFIG_PAX_USERCOPY
/* 0: not at all, 1: fully, 2: fully inside frame, -1: partially (implies an error) */
static noinline int check_stack_object(const void *obj, unsigned long len)
{
const void * const stack = task_stack_page(current);
const void * const stackend = stack + THREAD_SIZE;
#if defined(CONFIG_FRAME_POINTER) && defined(CONFIG_X86)
const void *frame = NULL;
const void *oldframe;
#endif
if (obj + len < obj)
return -1;
if (obj + len <= stack || stackend <= obj)
return 0;
if (obj < stack || stackend < obj + len)
return -1;
fs/exec.c
15. All hail gcc features!
● gcc has many wonderful features
○ Some of them documented, some of them undocumented.
○ One of those documented:— Built-in Function: void * __builtin_frame_address (unsigned int level)
This function is similar to __builtin_return_address, but it returns the address of the function frame rather than the return address of the function.
Calling __builtin_frame_address with a value of 0 yields the frame address of the current function, a value of 1 yields the frame address of the caller
of the current function, and so forth.
...
17. check_kernel_text_object()
● Now all there is to do is that we are not reading/writing from .text
#ifdef CONFIG_PAX_USERCOPY
static inline bool check_kernel_text_object(unsigned long low, unsigned long high)
{
...
unsigned long textlow = (unsigned long)_stext;
unsigned long texthigh = (unsigned long)_etext;
/* check against linear mapping as well */
if (high > (unsigned long)__va(__pa(textlow)) &&
low < (unsigned long)__va(__pa(texthigh)))
return true;
if (high <= textlow || low >= texthigh)
return false;
else
return true;
}
#endif
fs/exec.c
18. check_object_size() - Again, now from the other side
void __check_object_size(const void *ptr, unsigned long n, bool to_user, bool const_size)
{
#ifdef CONFIG_PAX_USERCOPY
const char *type;
#endif
...
#ifdef CONFIG_PAX_USERCOPY
if (!n)
return;
type = check_heap_object(ptr, n);
if (!type) {
int ret = check_stack_object(ptr, n);
if (ret == 1 || ret == 2)
return;
if (ret == 0) {
if (check_kernel_text_object((unsigned long)ptr, (unsigned long)ptr + n))
type = "";
else
return;
} else
type = "";
}
pax_report_usercopy(ptr, n, to_user, type);
#endif
fs/exec.c
✔
✔
✔
20. How you’d imagine it
● Generally, what this means is that on every deallocation, we’d like to sanitize
the memory.
● If you ask me, it should look something like this:
void kfree(const void *block)
{
/* Some kfree() logic */
…
memset(block, 0x42, len);
}
kernel/lets_hope_its_that_way.c
21. Security is configurable! (not really…)
● PaX is configured for fast sanitization by default
○ Configurable via kernel command line
#ifdef CONFIG_PAX_MEMORY_SANITIZE
enum pax_sanitize_mode pax_sanitize_slab __read_only = PAX_SANITIZE_SLAB_FAST;
static int __init pax_sanitize_slab_setup(char *str)
{
if (!str)
return 0;
if (!strcmp(str, "0") || !strcmp(str, "off")) {
pax_sanitize_slab = PAX_SANITIZE_SLAB_OFF;
} else if (!strcmp(str, "1") || !strcmp(str, "fast")) {
pax_sanitize_slab = PAX_SANITIZE_SLAB_FAST;
} else if (!strcmp(str, "full")) {
pax_sanitize_slab = PAX_SANITIZE_SLAB_FULL;
}
...
}
early_param("pax_sanitize_slab", pax_sanitize_slab_setup);
#endif
mm/slab_common.c
○ But actually PAX_SANITIZE_SLAB_FAST doesn’t do anything. (X_X)
22. Using the security powder
● In order to use it, pass pax_sanitize_slab=full as kernel argument
● Creating a SLAB that is sanitizable
struct kmem_cache *
kmem_cache_create(const char *name, size_t size, size_t align,
unsigned long flags, void (*ctor)(void *))
{
...
#ifdef CONFIG_PAX_MEMORY_SANITIZE
if (pax_sanitize_slab == PAX_SANITIZE_SLAB_OFF || (flags & SLAB_DESTROY_BY_RCU))
flags |= SLAB_NO_SANITIZE;
else if (pax_sanitize_slab == PAX_SANITIZE_SLAB_FULL)
flags &= ~SLAB_NO_SANITIZE;
#endif
mm/slab_common.c
● I told you PAX_SANITIZE_SLAB_FAST does nothing...
23. Before anything else (!)
static int calculate_sizes(struct kmem_cache *s, int forced_order)
{
...
if (((flags & (SLAB_DESTROY_BY_RCU | SLAB_POISON)) ||
#ifdef CONFIG_PAX_MEMORY_SANITIZE
(!(flags & SLAB_NO_SANITIZE)) ||
#endif
s->ctor)) {
/*
* Relocate free pointer after the object if it is not
* permitted to overwrite the first word of the object on
* kmem_cache_free.
*
* This is the case if we do RCU, have a constructor or
* destructor or are poisoning the objects.
*/
s->offset = size;
size += sizeof(void *);
}
…
static inline void set_freepointer(struct kmem_cache *s, void *object, void *fp)
{
*(void **)(object + s->offset) = fp;
}
mm/slub.c
28. What is ASLR?
● ASLR is a CONCEPT
● Acronym for: Address Space Layout Randomization
● What it basically does is random the address space for each execution
● Does it constrain which binaries we are allowed to run?
● Linux comes with a built-in ASLR
○ But it’s not that good :(
● Activate it with:
$ echo 2 > /proc/sys/kernel/randomize_va_space
29. Example
● In theory, on every execution the memory map should be different:
Heap
Stack
Code
Heap
Stack
Code
Heap
Stack
Code
Execution #1 Execution #2 Execution #3
30. Configuring PaX on an ELF basis
● Three (3) types of configuration options (depends on .config)
○ Default configuration (activates all mechanisms all the time)
○ Annotation as part of the ELF header (old method)
○ As part of file system extended attributes
● Check out chpax(1) and paxctl(1)
○ Requires a recompilation with:
■ PAX_PT_PAX_FLAGS / PAX_XATTR_PAX_FLAGS
31. So what do we REALLY want to random?
● Each process consists of several parts
○ brk (i.e. for malloc)
○ The ELF itself
○ mmap()s
■ For data
■ For dynamic libraries
○ Stack
● Let’s find out what happens to each of them
32. Brk
● Know load_elf_binary()? Well, you should!
static int load_elf_binary(struct linux_binprm *bprm)
{
...
#ifdef CONFIG_PAX_RANDMMAP
if (current->mm->pax_flags & MF_PAX_RANDMMAP) {
unsigned long start, size, flags;
vm_flags_t vm_flags;
start = ELF_PAGEALIGN(elf_brk);
size = PAGE_SIZE + ((pax_get_random_long() & ((1UL << 22) - 1UL)) << 4);
flags = MAP_FIXED | MAP_PRIVATE;
vm_flags = VM_DONTEXPAND | VM_DONTDUMP;
down_write(current->mm->mmap_sem);
start = get_unmapped_area(NULL, start, PAGE_ALIGN(size), 0, flags);
...
if (retval == 0)
retval = set_brk(start + size, start + size + PAGE_SIZE);
...
}
#endif
fs/binfmt_elf.c
BTW this is basically a call to prandom_u32()
33. ELF
static int load_elf_binary(struct linux_binprm *bprm)
{
...
load_bias = ELF_ET_DYN_BASE - vaddr;
if (current->flags & PF_RANDOMIZE)
load_bias += arch_mmap_rnd();
load_bias = ELF_PAGESTART(load_bias);
#ifdef CONFIG_PAX_RANDMMAP
/* PaX: randomize base address at the default exe base if requested */
if ((current->mm->pax_flags & MF_PAX_RANDMMAP) && elf_interpreter) {
load_bias = (pax_get_random_long() & ((1UL << PAX_DELTA_MMAP_LEN) - 1)) <<
PAGE_SHIFT;
load_bias = ELF_PAGESTART(PAX_ELF_ET_DYN_BASE - vaddr + load_bias);
elf_flags |= MAP_FIXED;
}
#endif
...
error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt,
elf_prot, elf_flags, total_size);
...
}
#endif
fs/binfmt_elf.c
Regular Linux ASLR
PaX ASLR
Brutal
overwrite
to Linux’s
previous
work
Make sure we map it exactly there
34. Randomizing the Stack
● This decides where the stack top will be placed
#define STACK_TOP ((current->mm->pax_flags & MF_PAX_SEGMEXEC)?SEGMEXEC_TASK_SIZE:TASK_SIZE)
arch/x86/include/asm/processor.h
● And now randomize!
static unsigned long randomize_stack_top(unsigned long stack_top)
{
#ifdef CONFIG_PAX_RANDUSTACK
if (current->mm->pax_flags & MF_PAX_RANDMMAP)
return stack_top - current->mm->delta_stack;
#endif
/* … In load_elf_binary … */
retval = setup_arg_pages(bprm, randomize_stack_top(STACK_TOP),
executable_stack);
fs/binfmt_elf.c
Another brutal hijack of PaX
before any original Linux logic
36. Reminder for mmap(2)
● We want to random mmap for:
○ Random data locations (i.e. got)
○ Random dynamic libraries locations (dynamic linking)
NAME
mmap, munmap - map or unmap files or devices into memory
SYNOPSIS
#include <sys/mman.h>
void *mmap(void *start, size_t length, int prot, int flags,
int fd, off_t offset);
int munmap(void *start, size_t length);
37. Allegory
A B C
● What you’d expect
● How it is in reality
A B C
D
E
I
G
H
J
F
38. Randomizing mmap()
void arch_pick_mmap_layout(struct mm_struct *mm)
{
unsigned long random_factor = 0UL;
#ifdef CONFIG_PAX_RANDMMAP
if (!(mm->pax_flags & MF_PAX_RANDMMAP))
#endif
if (current->flags & PF_RANDOMIZE)
random_factor = arch_mmap_rnd();
mm->mmap_legacy_base = mmap_legacy_base(mm, random_factor);
if (mmap_is_legacy()) {
mm->mmap_base = mm->mmap_legacy_base;
mm->get_unmapped_area = arch_get_unmapped_area;
} else {
mm->mmap_base = mmap_base(mm, random_factor);
mm->get_unmapped_area = arch_get_unmapped_area_topdown;
}
#ifdef CONFIG_PAX_RANDMMAP
if (mm->pax_flags & MF_PAX_RANDMMAP) {
mm->mmap_legacy_base += mm->delta_mmap;
mm->mmap_base -= mm->delta_mmap + mm->delta_stack;
}
#endif
}
arch/x86/mm/mmap.c
Remember these
fellas. We will use
them to allocate new
pages.
Here we use the
random we
generated before
39. Calling the actual logic
unsigned long
arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
const unsigned long len, const unsigned long pgoff,
const unsigned long flags)
{
...
info.flags = VM_UNMAPPED_AREA_TOPDOWN;
info.length = len;
info.low_limit = PAGE_SIZE;
info.high_limit = mm->mmap_base;
info.align_mask = 0;
info.align_offset = pgoff << PAGE_SHIFT;
if (filp) {
info.align_mask = get_align_mask();
info.align_offset += get_align_bits();
}
info.threadstack_offset = offset;
addr = vm_unmapped_area(&info);
arch/x86/kernel/sys_x86_64.c
40. A brief on VMAs
● VMA stands for Virtual Memory Area
● Represents a chunk of virtually contiguous pagesstruct vm_area_struct {
/* The first cache line has the info for VMA tree walking. */
unsigned long vm_start; /* Our start address within vm_mm. */
unsigned long vm_end; /* The first byte after our end address
within vm_mm. */
/* linked list of VM areas per task, sorted by address */
struct vm_area_struct *vm_next, *vm_prev;
struct rb_node vm_rb;
/*
* Largest free memory gap in bytes to the left of this VMA.
* Either between this VMA and vma->vm_prev, or between one of the
* VMAs below us in the VMA rbtree and its ->vm_prev. This helps
* get_unmapped_area find a free area of the right size.
*/
unsigned long rb_subtree_gap;
...
} __randomize_layout;
include/linux/mm_types.h
ZONE_DMA is for DMA controllers
ZONE_NORMAL implements the Buddy System
ZONE_HIGHMEM is for everything else: temporary mappings, usermode memory etc…
Focus on ZONE_NORMAL.
Implements Slab Allocator (kmalloc uses it!)
Slab is implemented on top of the Buddy System.
Using it for fast access to memory
Note that copy_from/to_user is ARCHITECTURE DEPENDANT.
Why? Every architecture has a different way of invalidating caches, loading/unloading pages to swap etc...
Eventually, we want to execute this function so we avoid pax_report_usercopy.
The purpose of this function is mainly looking up the pointer’s slab.
First make some basic sanity checks (not null, is mapped (virt_addr_valid))
Then get the first page of the page chain for this pointer
Then check if this pointer is even backed by any slab cache
This protection adds a parameter to SLABs to be marked so they can be used for communication with usermode.
If this SLAB is not one of these, then we shouldn’t allow a copy.
Finally, check that the copy is within the size of a single object of the SLAB
Otherwise, return the name of the SLAB so the previous function will fail.
Few basic checks:
Len does not overflow
The object is not inside the (kernel!!) stack
The object does not overrun the stack partially.
We travel the entire stack frame by frame
For each frame, we make sure the object ends before the next frame...
...and starts after the current frame + 2 pointers
Know what they are?
Psst: return address and frame pointer
For those that attend my previous lecture, does anyone see a semi-problem here?
Do note that this DOES NOT comply with CONFIG_CC_STACKPROTECTOR
Very basic. Checks against start of text and end of text symbols.
Double check for any case of double mapping (i.e. vector table)
Eventually, we want to execute this function so we avoid pax_report_usercopy.
PAX_SANITIZE_SLAB_FAST doesn’t do anything.
Use PAX_SANITIZE_SLAB_FULL
This shows PAX_SANITIZE_SLAB_FAST doesn’t do anything.
Doesn’t really show anything else… :P
Why is this? Anyone knows why this is?
Well, the answer is written in the comment...
Why is this? Anyone knows why this is?
Well, the answer is written in the comment…
The CTor is here because we don’t wanna waste time on initializing a SLAB on an allocation.
It’s important to know when your functions are called!
Why does it suspend the interrupts?
Well, zeroing the entire page might take a while and we don’t want anyone using it in the meantime...
Yes. ASLR is a concept.
Constraint: (generally!) only position independent code.
In practice: position dependent code has a much lower entropy.
Yes. ASLR is a concept.
Stuff to random
Brk
ELF
mmap()s
Data
Libraries
Stack
Basically, this reserves a random size for the BRK area
Then sets the brk as start+size.
This is pretty straight forward
Note Linux’s basic ASLR
Note that PaX brutally overwrites Linux’s work for load_bias
First part of the function defines the higher-most address of the stack VMA
[p] is not very indicative. It indicates the highest address in memory.
So there are two possible actions here:
Either we exceed the rlimit, so we will shrink the stack
Or we will expand the stack according to this weird “random” const
All in all, the randomness of the stack is conveyed by the vma->end address we use as stack_top
We are going to execute arch_get_unmapped_area_topdown() to allocate new memory.
Using mmap_base as the boundary for our allocation.
Just highlight the info.high_limit
Notice that this is the only thing that is being randomized.
So actually, mmap randomness is based on disallowing processes to allocate any higher than a certain address.
VMA consists of a start address and an end address.
It is linked to the other VMAs of the process via two datatypes
Doubly linked list
Red-Black tree.
It also has a “gap” member that resembles the largest unmapped area between VMAs of the current subtree (of the Red-Black tree).