Memory Management for High-Performance Applications

Memory Management
for High-Performance Applications
Emery Berger
University of Massachusetts Amherst

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
AMHERST

High-Performance Applications
Web servers,


search engines,
scientific codes cpu
cpu
cpu cpu RAM
cpu
cpu cpu RAM
cpu

C or C++
cpu RAM
 cpu RAID drive
cpu Raid drive
cpu Raid drive

Run on one or


cluster of server
boxes software
compiler
Needs support at every level
 runtime system
operating system
hardware

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 2
AMHERST

New Applications,
Old Memory Managers

Applications and hardware have changed


Multiprocessors now commonplace


Object-oriented, multithreaded


Increased pressure on memory manager


(malloc, free)

But memory managers have not kept up


Inadequate support for modern applications


AMHERST

Current Memory Managers
Limit Scalability

As we add

Runtime Performance
processors, 14
13

program slows 12
Ideal
11
10
down Actual
9
Speedup
8

Caused by heap 7
 6
5
contention 4
3
2
1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Number of Processors

Larson server benchmark on 14-processor Sun
AMHERST

The Problem

Current memory managers


inadequate for high-performance
applications on modern architectures
Limit scalability & application


performance

AMHERST

This Talk
Building memory managers


Heap Layers framework


Problems with current memory managers


Contention, false sharing, space


Solution: provably scalable memory manager


Hoard


Extended memory manager for servers


Reap


AMHERST

Implementing Memory Managers
Memory managers must be


Space efficient


Very fast


Heavily-optimized C code


Hand-unrolled loops


Macros


Monolithic functions


Hard to write, reuse, or extend


AMHERST

Real Code: DLmalloc 2.7.2
#d e f i n e c h u n k s i z e ( p ) ( ( p ) - >s i z e & ~( S I ZE_BI TS ) )
#d e f i n e n e x t _ c h u n k ( p ) ( ( mc h u n k p t r ) ( ( ( c h a r * ) ( p ) ) + ( ( p ) - >s i z e & ~PREV_I NUS E) ) )
#d e f i n e p r e v _ c h u n k ( p ) ( ( mc h u n k p t r ) ( ( ( c h a r * ) ( p ) ) - ( ( p ) - >p r e v _s i z e ) ) )
#d e f i n e c h u n k _ a t _ o f f s e t ( p , s ) ( ( mc h u n k p t r ) ( ( ( c h a r * ) ( p ) ) + ( s ) ) )
#d e f i n e i n u s e ( p )
( ( ( ( mc h u n k p t r ) ( ( ( c h a r * ) ( p ) ) +( ( p ) - >s i z e & ~PREV_I NUS E) ) ) - >s i z e ) & PREV_I NUS E)
#d e f i n e s e t _ i n u s e ( p )
( ( mc h u n k p t r ) ( ( ( c h a r * ) ( p ) ) + ( ( p ) - >s i z e & ~PREV_I NUS E) ) ) - >s i z e | = PREV_I NUS E
#d e f i n e c l e a r _ i n u s e ( p )
( ( mc h u n k p t r ) ( ( ( c h a r * ) ( p ) ) + ( ( p ) - >s i z e & ~PREV_I NUS E) ) ) - >s i z e &= ~( PREV_I NUS E)
#d e f i n e i n u s e _ b i t _ a t _ o f f s e t ( p , s )
( ( ( mc h u n k p t r ) ( ( ( c h a r * ) ( p ) ) + ( s ) ) ) - >s i z e & PREV_I NUS E)
#d e f i n e s e t _ i n u s e _ b i t _ a t _ o f f s e t ( p , s )
( ( ( mc h u n k p t r ) ( ( ( c h a r * ) ( p ) ) + ( s ) ) ) - >s i z e | = PREV_I NUS E)
#d e f i n e MAL L OC_ ZERO( c h a r p , n b y t e s )
do {
I NTERNAL _ S I ZE_ T* mz p = ( I NTERNAL_S I ZE_T* ) ( c h a r p ) ;
CHUNK_ S I ZE_ T mc t mp = ( n b y t e s ) /s i z e o f ( I NTERNAL_S I ZE_T) ;
l o n g mc n ;
i f ( mc t mp < 8 ) mc n = 0 ; e l s e { mc n = ( mc t mp - 1 ) /8 ; mc t mp %= 8 ; }
s wi t c h ( mc t mp ) {
c a s e 0 : f o r ( ; ; ) { * mz p ++ = 0 ;
c a s e 7: * mz p ++ = 0 ;
c a s e 6: * mz p ++ = 0 ;
c a s e 5: * mz p ++ = 0 ;
c a s e 4: * mz p ++ = 0 ;
c a s e 3: * mz p ++ = 0 ;
c a s e 2: * mz p ++ = 0 ;
c a s e 1: * mz p ++ = 0 ; i f ( mc n <= 0 ) b r e a k ; mc n - - ; }
}
} wh i l e ( 0 )

AMHERST

Programming Language Support
Classes Mixins
 

Overhead No overhead
 

Rigid hierarchy Flexible hierarchy
 

Sounds great...


AMHERST

A Heap Layer
C++ mixin with malloc & free methods


RedHeapLayer template <class SuperHeap>
class GreenHeapLayer :
public SuperHeap {…};

GreenHeapLayer

AMHERST

Example: Thread-Safe Heap Layer

LockedHeap
protect the superheap
with a lock
LockedMallocHeap

m a llocH ea p

L ockedH ea p

AMHERST

Empirical Results
Runtime (normalized to Lea allocator)

Heap Layers vs.
 Kingsley KingsleyHeap Lea LeaHeap

Normalized Runtime
1.5

originals: 1.25
1
0.75
KingsleyHeap
 0.5
0.25

vs. BSD allocator 0
cfrac espresso lindsay LRUsim perl roboop Average
Benchmark
LeaHeap


vs. DLmalloc 2.7 Space (normalized to Lea allocator)

Kingsley KingsleyHeap Lea LeaHeap

Competitive

Normalized Space

2.5
2

runtime and 1.5

1

memory efficiency 0.5
0
cfrac espresso lindsay LRUsim perl roboop Average
Benchmark

AMHERST

Overview




Problems with memory managers


Contention, space, false sharing


Solution: provably scalable allocator


Hoard




Reap


AMHERST

Problems with General-Purpose
Memory Managers
Previous work for multiprocessors


Concurrent single heap [Bigler et al. 85, Johnson 91, Iyengar 92]


Impractical


Multiple heaps [Larson 98, Gloger 99]


Reduce contention but cause other problems:


P-fold or even unbounded increase in space

we show
Allocator-induced false sharing


AMHERST

Multiple Heap Allocator:
Pure Private Heaps
Key:
One heap per processor: = in use, processor 0

= free, on heap 1
gets memory
malloc


from its local heap
processor 0 processor 1
puts memory
free

x1= malloc(1)

on its local heap x2= malloc(1)
free(x1) free(x2)

x4= malloc(1)
x3= malloc(1)

STL, Cilk, ad hoc free(x3) free(x4)


AMHERST

Problem:
Unbounded Memory Consumption
Producer-consumer:

x1= malloc(1)
free(x1)
Processor 0 allocates

x2= malloc(1)

Processor 1 frees free(x2)

x3= malloc(1)
free(x3)

Unbounded memory


consumption
Crash!


AMHERST

Multiple Heap Allocator:
Private Heaps with Ownership
returns memory
 free
x1= malloc(1)
to original heap free(x1)
x2= malloc(1)

Bounded memory
 free(x2)

consumption
No crash!


“Ptmalloc” (Linux),


LKmalloc

AMHERST

Problem:
P-fold Memory Blowup

Occurs in practice
 processor 0 processor 1 processor 2

Round-robin producer- x1= malloc(1)

free(x1)
consumer x2= malloc(1)
free(x2)
processor i mod P allocates
 x3=malloc(1)

processor (i+1) mod P frees

free(x3)

Footprint = 1 (2GB),


but space = 3 (6GB)
Exceeds 32-bit address space:


Crash!
AMHERST

Problem:
Allocator-Induced False Sharing
False sharing
 CPU 0 CPU 1

Non-shared objects


on same cache line cache cache

Bane of parallel applications

bus

Extensively studied

cache line

All these allocators

x1= malloc(1) x2= malloc(1)

cause false sharing! thrash… thrash…

AMHERST

So What Do We Do Now?
Where do we put free memory?


on central heap: Heap contention
 

on our own heap: Unbounded memory
 

(pure private heaps) consumption
on the original heap: P-fold blowup
 

(private heaps with ownership)

How do we avoid false sharing?


AMHERST

Overview










Hoard




Reap


AMHERST

Hoard: Key Insights
Bound local memory consumption


 Explicitly track utilization

 Move free memory to a global heap

 Provably bounds memory consumption

Manage memory in large chunks


 Avoids false sharing

 Reduces heap contention

AMHERST

Overview of Hoard
global heap
Manage memory in heap blocks


Page-sized


Avoids false sharing


Allocate from local heap block


Avoids heap contention


processor 0 processor P-1
Low utilization


…
Move heap block to global heap


Avoids space blowup


AMHERST

Summary of Analytical Results
Space consumption: near optimal worst-case


Hoard: O(n log M/m + P) {P « n}


Optimal: O(n log M/m)

n = memory required
[Robson 70]
M = biggest object size
Private heaps with ownership: m = smallest object size

P = processors
O(P n log M/m)

Provably low synchronization


AMHERST

Empirical Results
Measure runtime on 14-processor Sun


Allocators


 Solaris (system allocator)

 Ptmalloc (GNU libc)

 mtmalloc (Sun’s “MT-hot” allocator)

Micro-benchmarks


Threadtest: no sharing


Larson: sharing (server-style)


Cache-scratch: mostly reads & writes


(tests for false sharing)
Real application experience similar


AMHERST

Runtime Performance:
threadtest

Many

threads,
no sharing
Hoard

achieves
linear
speedup

speedup(x,P) = runtime(Solaris allocator, one processor)
/ runtime(x on P processors)

AMHERST

Larson

Many

threads,
sharing
(server-style)
Hoard

achieves
linear
speedup

AMHERST

false sharing

Many

threads,
mostly reads
& writes of
heap data
Hoard

achieves
linear
speedup

AMHERST

Hoard in the “Real World”
Open source code


www.hoard.org


13,000 downloads


Solaris, Linux, Windows, IRIX, …


Widely used in industry


AOL, British Telecom, Novell, Philips


Reports: 2x-10x, “impressive” improvement in performance


Search server, telecom billing systems, scene rendering,


real-time messaging middleware, text-to-speech engine,
telephony, JVM

Scalable general-purpose memory manager

AMHERST

Overview










Hoard




Reap


AMHERST

Custom Memory Allocation
Replace new/delete, Very common practice
 

bypassing general-purpose Apache, gcc, lcc, STL,


allocator database servers…
Language-level
Reduce runtime – often 


support in C++
Expand functionality – sometimes


Reduce space – rarely


“Use custom
allocators”

AMHERST

The Reality
Lea allocator
 Runtime - Custom Allocator Benchmarks

often as fast Custom Win32 DLmalloc

or faster 1.75
non-regions regions averages
Normalized Runtime

1.5
Custom
 1.25
1
allocation 0.75
ineffective, 0.5
0.25
except for 0

regions.

ll
s
le
ze

r

ns
he

c
sim
r

c

ra
vp
se

on
lc
gc

l
ud
ee

io
ac

ve
5.
ar

gi
d-

6.

eg
m
br

17

ap

O
re
.p

xe

17
[OOPSLA 2002]

R
c-
7

-
bo

on
19

N
AMHERST

Overview of Regions
Separate areas, deletion only en masse


regioncreate(r) r
regionmalloc(r, sz)
regiondelete(r)

- Risky
Fast
+

- Accidental deletion
Pointer-bumping allocation
+

- Too much space
Deletion of chunks
+

Convenient
+

One call frees all memory
+

AMHERST

Why Regions?
Apparently faster, more space-efficient


Servers need memory management support:


Avoid resource leaks


Tear down memory associated with terminated


connections or transactions
Current approach (e.g., Apache): regions


AMHERST

Drawbacks of Regions
Can’t reclaim memory within regions


Problem for long-running computations,


producer-consumer patterns,
off-the-shelf “malloc/free” programs
unbounded memory consumption


Current situation for Apache:


vulnerable to denial-of-service


limits runtime of connections


limits module programming


AMHERST

Reap Hybrid Allocator
Reap = region + heap


Adds individual object deletion & heap


reapcreate(r)
r
reapmalloc(r, sz)
reapfree(r,p)
reapdelete(r)

Can reduce memory consumption


Fast


Adapts to use (region or heap style)


Cheap deletion


AMHERST

Using Reap as Regions
Runtime - Region-Based Benchmarks

Original Win32 DLmalloc WinHeap Vmalloc Reap
4.08
2.5
Normalized Runtime

2

1.5

1

0.5

0
lcc mudlle

Reap performance nearly matches regions
AMHERST

Reap: Best of Both Worlds
Combining new/delete with regions


usually impossible:
Incompatible API’s


Hard to rewrite code


Use Reap: Incorporate new/delete code into Apache


 “mod_bc” (arbitrary-precision calculator)

Changed 20 lines (out of 8000)


Benchmark: compute 1000th prime


With Reap: 240K


Without Reap: 7.4MB


AMHERST

Summary


Heap Layers framework [PLDI 2001]


Problems with current memory managers


Contention, false sharing, space


Solution: provably scalable memory manager


Hoard [ASPLOS-IX]




Reap [OOPSLA 2002]


AMHERST

Current Projects
CRAMM: Cooperative Robust Automatic Memory


Management
 Garbage collection without paging

 Automatic heap sizing

SAVMM: Scheduler-Aware Virtual Memory Management


Markov:


 Programming language for building high-performance servers

COLA: Customizable Object Layout Algorithms


 Improving locality in Java

AMHERST

www.cs.umass.edu/~plasma

AMHERST

AMHERST

Looking Forward
“New” programming languages


Increasing use of Java = garbage collection


New architectures


NUMA: SMT/CMP (“hyperthreading”)


Technology trends


Memory hierarchy


AMHERST

The Ever-Steeper
Memory Hierarchy
Higher = smaller, faster, closer to CPU


A real desktop machine (mine)


registers 8 integer, 8 floating-point; 1-cycle latency

L1 cache 8K data & instructions; 2-cycle latency

L2 cache 512K; 7-cycle latency

RAM 1GB; 100 cycle latency

Disk 40 GB; 38,000,000 cycle latency (!)

AMHERST

Swapping & Throughput

Heap > available memory - throughput plummets


AMHERST

Why Manage Memory At All?
Just buy more!


Simplifies memory management


Still have to collect garbage eventually…


Workload fits in RAM = no more swapping!


Sounds great…


AMHERST

Memory Prices Over Time
RAM Prices Over Time
(1977 dollars)

$10,000.00

$1,000.00

2K
$100.00
8K
Dollars per GB

32K
$10.00 128K
conventional DRAM
512K
2M
$1.00
8M

$0.10

$0.01
1977

1980
1981
1982

1985
1986
1987

1989
1990
1991
1992
1993
1994
1995

1997
1998
1999
2000

2002
2003
2004
2005
1978
1979

1983
1984

1988

1996

2001 Year

“Soon it will be free…”
AMHERST

Memory Prices: Inflection Point!
RAM Prices Ov er Time
(1977 dollars)

$10,000.00

$1,000.00
2K
8K
$100.00
32K
Dollars per GB

128K
$10.00 512K
S DRA M ,
conventional DRAM R DR A M ,
2M
DDR ,
Chipkill 8M
$1.00
512M
1G
$0.10

$0.01
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
Year

AMHERST

Memory Is Actually Expensive
Desktops:

Most ship with 256MB


1GB = 50% more $$


Laptops = 70%, if possible


Limited capacity


Servers:

Buy 4GB, get 1 CPU


free!
Sun Enterprise 10000:


8GB extra = $150,000!
8GB Sun RAM =
Fast RAM – new


technologies 1 Ferrari Modena
Cosmic rays…


AMHERST

Key Problem: Paging
Garbage collectors: VM oblivious


GC disrupts LRU queue


Touches non-resident pages


Virtual memory managers: GC oblivious


Likely to evict pages needed by GC


Paging


Orders of magnitude more time than RAM


BIG hit in performance and LONG pauses


AMHERST

Cooperative Robust Automatic
Memory Management (CRAMM)
Garbage collector Virtual memory manager
I’m a
cooperative
application!
Coarse-grained
change in
(heap-level)
memory pressure
Tracks per-process,
new heap size
Adjusts heap size overall
memory utilization

Fine-grained
page eviction
(page-level)
notification

Evacuates pages Page replacement
victim page(s)
Selects victim pages

Joint work: Eliot Moss (UMass), Scott Kaplan (Amherst College)


AMHERST

Fine-Grained Cooperative GC
Garbage collector Virtual memory manager

Fine-grained page eviction
notification

Evacuates pages Page replacement
victim page(s)
Selects victim pages

Goal: GC triggers no additional paging


Key ideas:


Adapt collection strategy on-the-fly


Page-oriented memory management


Exploit detailed page information from VM


AMHERST

Summary










Hoard


Future directions


AMHERST

If You Have to Spend $$...

more Ferraris: good
more memory: bad

AMHERST

www.cs.umass.edu/~emery/plasma
AMHERST

This Page Intentionally Left Blank

AMHERST

Virtual Memory Manager Support
New VM required: detailed page-level information


“Segmented queue” for low-overhead


unprotected protected

Local LRU order per-process, not gLRU (Linux)


Complementary to SAVM work:


“Scheduler-Aware Virtual Memory manager”
Under development – modified Linux kernel


AMHERST

Current Work: Robust
Performance
Currently: no VM-GC communicaton


BAD interactions under memory pressure


Our approach (with Eliot Moss, Scott Kaplan):


Cooperative Robust Automatic Memory
Management
LRU queue
memory pressure
Virtual Garbage
memory collector
empty pages
manager / allocator
reduced impact

AMHERST

Current Work: Predictable VMM
Recent work on scheduling for QoS


E.g., proportional-share


Under memory pressure, VMM is scheduler


Paged-out processes may never recover


Intermittent processes may wait long time


Scheduler-faithful virtual memory


(with Scott Kaplan, Prashant Shenoy)
 Based on page value rather than order

AMHERST

Conclusion
Memory management for high-performance applications
 Heap Layers framework [PLDI 2001]

Reusable components, no runtime cost


Hoard scalable memory manager [ASPLOS-IX]


High-performance, provably scalable & space-efficient


Reap hybrid memory manager [OOPSLA 2002]


Provides speed & robustness for server applications


Current work: robust memory management for


multiprogramming
AMHERST

The Obligatory URL Slide

http://www.cs.umass.edu/~emery

AMHERST

If You Can Read This,
I Went Too Far

AMHERST

Hoard: Under the Hood
S ystem Heap

get or return memory to global heap
HeapBlockManager

LockedHeap

HeapBlockManager
HeapBlockManager
S uperblockHeap

malloc from local heap,
LockedHeap Empty
LockedHeap
LockedHeap
free to heap block
Heap Blocks

P erP rocessorHeap FreeT oHeapBlock

Large
objects
MallocOrF reeHeap
(> 4K)

S electS izeHeap

select heap based on size
AMHERST

Replace new/delete, Very common practice
 

bypassing general-purpose Apache, gcc, lcc, STL,


allocator database servers…
Language-level
Reduce runtime – often 


support in C++
Expand functionality – sometimes


Reduce space – rarely


“Use custom
allocators”

AMHERST

Drawbacks of Custom Allocators
Avoiding memory manager means:


More code to maintain & debug


Can’t use memory debuggers


Not modular or robust:


Mix memory from custom


and general-purpose allocators → crash!

Increased burden on programmers


AMHERST

Overview
Introduction


Perceived benefits and drawbacks


Three main kinds of custom allocators


Comparison with general-purpose allocators


Advantages and drawbacks of regions


Reaps – generalization of regions & heaps


AMHERST

(1) Per-Class Allocators
Recycle freed objects from a free list


a = new Class1; Class1
Fast
free list +
b = new Class1;
c = new Class1; Linked list operations
+
a
delete a;
Simple
+
delete b;
Identical semantics
b +
delete c;
C++ language support
+
a = new Class1; c
Possibly space-inefficient
-
b = new Class1;
c = new Class1;

AMHERST

(II) Custom Patterns
Tailor-made to fit allocation patterns


Example: 197.parser (natural language parser)


db
a c
char[MEMORY_LIMIT]

end_of_array end_of_array
end_of_array
end_of_array
end_of_array
a = xalloc(8); Fast
+
b = xalloc(16);
+
c = xalloc(8);
- Brittle
xfree(b);
- Fixed memory size
xfree(c);
d = xalloc(8); - Requires stack-like lifetimes
AMHERST

(III) Regions


regioncreate(r) r
regionmalloc(r, sz)
regiondelete(r)

- Risky
Fast
+

+

- Too much space
Deletion of chunks
+

Convenient
+

+

AMHERST

Overview
Introduction












AMHERST

Custom Allocators Are Faster…
Runtime - Custom Allocator Benchmarks

Custom Win32

1.75
Normalized Runtime

1.5
1.25
1
0.75
0.5
0.25
0

s
r
er

he

ll
ll e
ze
m

c

ns
c
vp

on

ra
gc

lc
rs

si

ud
ac
ee

io
5.

ve
gi
6.
d-
pa

eg
m
17

ap
br

-re

O
17
xe
7.

R
c-

on
bo
19

N

AMHERST

Not So Fast…
Runtime - Custom Allocator Benchmarks
Custom Win32 DLmalloc

1.75
Normalized Runtime

1.5
1.25
1
0.75
0.5
0.25
0

l
s
l le

s
ze

r

he

c
er

sim

al
c
vp

n

on
lc
gc

r
ud
rs

io
ee

ac

ve
5.
d-

6.

i
g
pa

eg
m
br

17

ap

O
re
17
xe
7.

R
c-

-
bo

on
19

N

AMHERST

The Lea Allocator (DLmalloc 2.7.0)
Optimized for common allocation patterns


Per-size quicklists ≈ per-class allocation


Deferred coalescing


(combining adjacent free objects)
Highly-optimized fastpath


Space-efficient


AMHERST

Space Consumption Results
Space - Custom Allocator Benchmarks

Original DLmalloc

1.75
Normalized Space

1.5
1.25
1
0.75
0.5
0.25
0

ll
lle

s
c
r
e

s
e
er

c
im

ra
vp

lc

n

on
z

ch
c

ud
rs

io
ee

.g
-s

ve
5.

a

i
g
pa

eg
6
ed

m
br

17

ap

O
re
17
7.

R
c-
x

-
bo

on
19

N

AMHERST

Overview
Introduction












AMHERST

Why Regions?










AMHERST















AMHERST





reapcreate(r)
r
reapmalloc(r, sz)
reapfree(r,p)
reapdelete(r)



Fast
+

+

Cheap deletion
+

AMHERST


4.08
2.5
Normalized Runtime

2

1.5

1

0.5

0
lcc mudlle

AMHERST



usually impossible:











With Reap: 240K


Without Reap: 7.4MB


AMHERST

Conclusion
Empirical study of custom allocators


Lea allocator often as fast or faster


Custom allocation ineffective,


except for regions
Reaps:


Nearly matches region performance


without other drawbacks
Take-home message:


Stop using custom memory allocators!


AMHERST

Software

http://www.cs.umass.edu/~emery

(part of Heap Layers distribution)

AMHERST

Experimental Methodology
Comparing to general-purpose allocators


Same semantics: no problem


E.g., disable per-class allocators


Different semantics: use emulator


Uses general-purpose allocator


but adds bookkeeping
regionfree: Free all associated objects
 Other functionality (nesting, obstacks)

AMHERST

Use Custom Allocators?
Strongly recommended by practitioners


Little hard data on performance/space


improvements
Only one previous study [Zorn 1992]


Focused on just one type of allocator


Custom allocators: waste of time


Small gains, bad allocators


Different allocators better? Trade-offs?


AMHERST

Kinds of Custom Allocators
Three basic types of custom allocators


Per-class


Fast


Custom patterns


Fast, but very special-purpose


Regions


Fast, possibly more space-efficient


Convenient


Variants: nested, obstacks


AMHERST

Optimization Opportunity
Time Spent in Memory Operations

Memory Operations Other

100
80
% of runtime

60
40
20
0

lcc

ll e
sim

cc

e
ze

e
pr
r
se

ag
h

ud
v

g
ee

ac
5.
d-

6.
ar

er
m
ap
br

17
xe

17
p

Av
7.

c-
bo
19

AMHERST

AMHERST

Programmers often replace malloc/free


Attempt to increase performance


Provide extra functionality (e.g., for servers)


Reduce space (rarely)


Empirical study of custom allocators


Lea allocator often as fast or faster


Custom allocation ineffective,


except for regions. [OOPSLA 2002]
AMHERST

Overview of Regions


regioncreate(r) r
regionmalloc(r, sz)
regiondelete(r)

- Risky
Fast
+

+

- Too much space
Deletion of chunks
+

Convenient
+

+

AMHERST

Why Regions?










AMHERST















AMHERST





reapcreate(r)
r
reapmalloc(r, sz)
reapfree(r,p)
reapdelete(r)



Fast




Cheap deletion


AMHERST


4.08
2.5
Normalized Runtime

2

1.5

1

0.5

0
lcc mudlle

AMHERST



usually impossible:











With Reap: 240K


Without Reap: 7.4MB


AMHERST

Memory Management for High-Performance Applications

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Memory Management for High-Performance Applications

Semelhante a Memory Management for High-Performance Applications (20)

Mais de Emery Berger

Mais de Emery Berger (20)

Último

Último (20)

Memory Management for High-Performance Applications