High Performance Statistical Computing

High Performance Statistical
Computing with Applications in the
Social Sciences
Micah Altman
Senior Research Scientist
“introduction to the RCE” by,
Earl Robert Kinney
Manager, Research Computing Environment

Institute for Quantitative Social Science
Harvard University

Goals for today
Analysis
 Describe performance goals
 Identify resource use patterns
 Identify resource bottlenecks
 Identify performance hot-spots
 Select problem decomposition

Application
 Connect to RCE
 Use the RCE to analyze larger
[Source: Wikimedia Commons]
data sets
 Use the RCE to run interactive
analyses more quickly
 Use the RCE to run large numbers
of analyses independently

M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 25

Organization of this Workshop
 Motivation
 Principles
 Introduction to RCE
 Measuring Resource Use
 Scaling Up
 Tuning Up
 Scaling Out
(Parallelization)
 Additional Resources


Nine Steps to Faster Results
1. Predict your resource needs through benchmarks,
models, algorithmic analysis
2. Select alternate algorithms when resource needs grow
very rapidly with problem size
3. Identify resource bottlenecks using systems
performance analysis tools
4. Address bottlenecks by increasing resources and/or
changing program resource management
5. Discover hot-spots in programs using profiling tools
6. Adapt hot-spots to system architecture
7. Decompose the problem into independent subproblems
8. Distribute subproblems across pools of resources
9. Repeat analysis after making any changes


FREE! With every first class!

 Coffee!

 Chocolate!!

 Consulting!!!

 Time off for good behavior !!!!

M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 5 of 85

IQSS (and affiliates) offer you support across all stages of your quantitative
research:

 Research design, including:
design of surveys, selection of statistical methods.
 Primary and secondary data collection, including:
the collection of geospatial and survey data.
 Data management, including:
storage, cataloging, permanent archiving, and distribution.
 Data analysis, including :
survey consulting, statistical software training, GIS consulting, high performance
research computing.

http://iq.harvard.edu/


The IQSS grants administration team helps with every aspect
of the grant process. Contact us when you are planning
your proposal.

 Assisting in identifying research funding opportunities
 Consulting on writing proposals
 Assisting IQSS affiliates with:
 preparation, review and submission of all grant applications
(“pre-award support”)
 management of their sponsored research portfolio
(“post-award support”)
 Interpret sponsor policies
 Coordinate with FAS Research Administration and the Central Office for
Sponsored Programs

… And, of course, support seminars like this!


“One‟s Reach should exceed One‟s Grasp”
 Leading edge statistical methods (such as
High Performance Statistical Computing: Principles

MCMC) can require lots of computing power
 Ensuring robust results can multiply (and re-
multiply) the number of analyses done:
 Sensitivity analysis
 Parameterization studies
 Alternative models, Bayesian model averaging
 Performance benchmarks provides
information for budgeting computing $$$


“I Want it Now!”

 Deadlines abound:
conferences, trials, publication dates
 New observations, variables, corrections, or
model specifications may necessitate speedy
reanalysis


"My strength is as the strength of ten
because my heart is pure."

 Selection of algorithms can change the
nature of the computational resource usage
 Tuning for a particular system can increase
performance approximately ten-fold
 In some circumstances work can be split
across thousands of systems.


Principles

 Goals matter
 Problems matter
 Algorithms matter
 Answers matter
 Architecture matters


Types of Performance Goals
Task completion time – wait time to finish



 Efficiency – resource use for task
 Throughput –work done by system overall
 Latency – delay before response
 Responsiveness – perception of response
 Reliability – probability task/system will fail
during time interval
“If you don‟t know where you‟re going, any road will take
you.” – Proverb
“If you come to a fork in the road, take it.” – Yogi Berra

Performance Goals – Rules of Thumb
Rules of thumb  Completion time:
work (i)/resource(i)

 Throughput:
 Users of interactive maximize
software want (work/resource)
responsiveness for all jobs
 Latency: time elapsed before
first response to input
 Users of batch jobs want
 Real-time: complete task
small completion times within fixed interval
 “Responsiveness”
 Systems administrators  Perceived latency
want maximum  Task completion time
throughput, reliability  Task progress indicators


Size of Factors Affecting Performance
Run Time For Large Instance

(n=1000)

If runtime for solve small NP-Hard (worst case) 10^292 years
instance of a problem Very Inefficient 1.6 years
(n=10), running on single Algorithm
O(N^3)
system is one minute, how
Inefficient Algorithm 16 hours
long will it take to solve O(N^2)
larger instance of n=1000? Very Poor Memory 11 hours
Access Patterns
Un-optimized Code 67 minutes
Optimized code 7 minutes
Local Multiprocessing 2 minutes
Fully Parallel/Full 4 seconds
Cluster

Problem Complexity Classes

Decision Problem
 Problem complexity class:
set of problems that can be
Decidable Undecidable
solved in O(f(n)) for some f
 More general than EXPSPACE
algorithmic complexity –
EXPTIME
encompasses all possible
algorithms to solve the PSPACE
given problem
CO-NP NP
 Polynomial time algorithm BQP
necessary for large problem NP-complete
instances P = BPP(?)


Some Problems Are HARD
 Traveling Salesperson Problem (weighted Hamiltonian cycle): Plot a route through N

locations, visiting each once, that minimizes cost.
 NP-Hard: worst-case instances require exponential time for optimal, certain, solution
 NP-Complete: Equivalent to a large class of hard problems

Source: Applegate, Bixby, Chvátal, and Cook (1998)


How to “Solve” the Unsolvable
 Think small: Use only a small number of cities.
Aggregate to regions and treat as quasi-cities.

 Restrict Problem – Euclidean distances are
easier than travel cost.
 Solve a different problem: minimum spanning
tree
 Approximate solution: for Euclidean
distances, there is an algorithm based on
minimum spanning tree that is at most 50%
longer
 Randomize: can a randomized algorithm find
solution with probability p? (No one
knows…, probably not)
 Be Lucky: maybe “average” problem isn‟t that
hard?
 Heuristics: Apply Simulated Annealing
(etc.), cross fingers


How to recognize hard problems…
 Is the problem routinely solved by existing

systems?
 Are efficient algorithms known?
 Does it appear in lists of hard problems?
 Is the problem universal?
(Any computing problem, sufficiently generalized is hard  [Papadimitriou 1994])

 Is run time growing exponentially in practice?


Algorithmic Complexity
 Measures the complexity of a particular

solution to a problem
 Resource complexity: a measure of the
resources used to solve a problem, as a
function of input size
 Common resource measures:
 Time, usually represented as number of
operations executed
 Space, usually represented as number of discrete
scalar values stored


Algorithmic Complexity: Search
bubbleSort(list) Number of Operations
while (not finished)

{
finished <- true
for i in (1 to length(list)-1)
2
{
if (list[i]>list[i+1]) On
{
swap(list[i],list[i+1])
finished<- false
}
}
}
quicksort(list)
if (length(list)=1) return
select from (list)
for x in (list) {
if x=pivot, add x to pivotList
if x>pivot, add x to greaterList
if x>pivot, add x to lessList
O n
nlog
}
return(quicksort(
lessList + pivotlist + greaterlist
))

*illustrations courtesy of wikipedia

Search Complexity Continued
 Tally sort:

Items in a fixed range, no duplicates
 Inlist = logical(length=max-min)
 For (i = 1:length(items)) {inlist[items[i]]=TRUE)}
 For (I in mix:max) if (inlist[i]) dowork(i)

 How fast is this?
Algorithm Recurse_sort(array L, i = 0, j = length(L)-1)
 if L[j] < L[i] then
 L[i] ↔ L[j]
 if j - i > 1 then
 t = (j - i + 1)/3
 Recurse_sort (L, i , j-t)
 Recurse_sort (L, i+t, j )
 Recurse_sort (L, i , j-t)
 return L


Answers Matter

 Before optimization, verify the answer
 Right can mean “right enough” if well-defined
 Correct code may have different performance
characteristics than incorrect code
 Returning wrong answer can always be done
quickly 


Simple VN Architecture

Processor
Memory
Input/Out
put


More Modern
Disk
Disk

RAID Disk
Processor Controller
Processor
Processor
Core Core

Memory FPU FPU
L2

L1 L1
Network
Card

GPU


Inside the Core

Disk
Disk
RAID Disk
Processor Contro
Processor ller
Processor
C
or
e
C
or
e
F
Memo L2
F
P
U
P
U

ry
L L Networ
1 1 k Card

GPU

© Intel

Deep Inside the Core


Resource Hierarchy: Big, Fast, Cheap*

Registers
(<1KB)

• Big, Fast, Cheap – Pick 2
Cache (1 MB)
• Latency increases with each step down

• Storage increases
Ram (10 Gigabytes) • Throughput decreases
(except, with some offline storage)
Local Storage (10‟s Terabytes)

ONLINE Storage (100‟s Petabytes)

OFFLINE STORAGE ( 10‟s Exabytes )


Reading One Byte: x-= m[1,3]

CPU: 8 bytes: Load Register

Cache: 256 Bytes <- Cache Line

RAM: 4K <- Page

Disk <- 8K from NFS

Networked File System


General Performance Implications of
Architecture

 Talking to external devices can cause waits… (latency)
 Information transmitted to CPU is limited by bus
(throughput)
 In practice, expect 80% of theoretical data-path bandwidth at best
 Some optimizations are highly specific to architectural
details
 Hidden parallelism at low levels
 Information travels in chunks
(at least bus size)
 Complexity makes theoretical performance analysis difficult
– use benchmarks


From Principles to Practice

 Practice = Principles * Optimization Goals *
Problem Type * Computing Environment
 Optimization Goals
 Throughput
 Latency
 Reliability
 Scaling up
 Scaling out
 Problem Decomposition
 Independent data
 Independent calculations
 Coupled calculations


Principled Preparation Checklist

 Verify that your problem is tractable:
 Substitute an easier problem
 Restrict or limit the problem
 Be lucky or clever
 Establish performance goals
 Identify possible algorithms
 What is their resource complexity?
 Are better algorithms known?
 Identify potential system characteristics
 Communications costs
 Systems resources


Lab 0: Problem definition
 Define your computing

problem as formally as
you can?
 What algorithms are you
used to solve the
problem?
 What are your
performance goals? [Source: http://andreymath.wikidot.com/
. Creative Commons Sharealike
Licensnce]


An Introduction to the IQSS RCE
High Performance Statistical Computing :

 What is it?
Introduction to RCE

 Why use it?
 How does it work?
 How do we use it?


What is the RCE?
Introduction to RCE

•Full virtual desktop •For large interactive jobs. •Run hundreds of jobs at
Virtual Desktop

Interactive Nodes

Batch Processing
environment – connect •Large amounts of memory once.
anywhere available on demand •Optimized for non-
•Many research software •Stata, Matlab, Mathmatica interactive, independent
packages available work
•Persistent session – •Easy to run from your
connect anytime virtual desktop


Why use the RCE?
 For Research

 An environment customized for quantitative social science research
 A wide variety of research software packages are available
 Fore Convenience
Introduction to RCE

 The RCE enables you to access a research desktop from almost any computer
 Sessions are persistent -- disconnect from your office, reconnect from home
 File storage is central. Never worry about which computer has your files
 For Resources
 Large analysis jobs are offloaded to high-powered
 Large resource pools : 800 processors , 3.3 TB of memory , 40 TB of disk storage
 Regularly updated software
 For collaboration
 Offers an ideal environment for collaborative research projects
 Share project files, desktops, software
 For reliability
 System performance and availability is constantly monitored
 Research files are regularly backed up and stored securely
 IQSS has full time staff dedicated to the support the RCE


RCE Architecture

Virtual Batch Nodes
Desktop
Client Sessions
Introduction to RCE

Login
Nodes

Interactive Nodes

Disk Disk Disk Disk


RCE Architecture Rules of Thumb
Connect to interactive pool



 Small problems – run directly
Introduction to RCE

(on an interactive node)
 Large-memory problems use interactive
nodes
 Interactive problems use interactive nodes
 Large-compute jobs use Batch submit
-- but problem must be decomposed


RCE Powered Apps – How it Works
User clicks on application User receives notice, offered

from menu batch node to run their job.

No node is available User hits “yes”
Introduction to RCE

RCE checks for availability of RCE submits special
Interactive nodes condor
job to batch master node.

A node is available ~120
s

RCE submits special condor Window appears on RCE
job to interactive ~30 desktop and application
master node. s runs on node.


RCE Desktop

Application Menu – Application launching
Introduction to RCE

Quick Launch – Quick Access to E-mail, Web,
and Office applications.
File Browser – Graphical view of your
home directory and files.
HMDC Outage Notifier – Updates to reflect
status of environment.

Desktop Shortcuts – Contains
Shortcuts to home directory and trash

Status Bar – Shows open applications.


Login Nodes

Number of servers:
8
Introduction to RCE

Number of processors:
32

RAM per session:
~6 GB


Apps On Interactive Nodes

 Features
Introduction to RCE

 Easiest way to
launch applications
 Limitations
 Smaller amounts of
RAM
 Competition for
resources with
interactive
processes.

Interactive Nodes

Number of servers:
13
Introduction to RCE

Number of
processors: 84

RAM per job:
1-64GB


Apps on Interactive Nodes

 Features
 More memory available for
Introduction to RCE

application
 Dedicated processor
reduces competition for
resources
 Multiple cores available
(e.g. for Stata-MP)
 Limitations
 Interactive nodes are limited
in number
 Time limit on applications
(currently 72 hours)
 Time can be extended by
request


Batch Nodes

Number of servers:
61
Introduction to RCE

Number of processors:
258

RAM per job:
2-4GB


Running Statistical Apps On
Batch Nodes

 Features
Introduction to RCE

 Nearly 400 nodes can
run at the same time
 Well suited for loosely-
coupled parallel
problems
 Limitations
 Memory is more limited
 Application must be
designed to harness the
power of all node
 No failover to other pools

Memory Limitations

 Login Nodes
 Each user on the machine
Introduction to RCE

is allowed to use a portion
of available memory
 No enforcement of login
limits (can be
oversubscribed)
 Interactive/Batch Nodes
 Each node has share of
memory based on request
 Physical hardware will only
run number of jobs equal to
processor cores (not
oversubscribed)


Get started with the RCE: Checklist

 Apply for an RCE account:
support@help.hmdc.harvard.edu
Introduction to RCE

 Install the free NX software
 Connect to rce.hmdc.harvard.edu
 Run interactive programs with menus
 Run large interactive jobs with “RCE
Powered” menu
 Run large batch jobs using a simple
launcher script

Lab 1: Connecting to the RCE

 In this lab, we will
Analyzing Resource Use

login to the RCE and
launch stata on a
Interactive node

[Source: http://andreymath.wikidot.com/
Licensnce]


Systems Resource Use

 Benchmarks
 Timing

 System resource monitoring
 System resource limits


Benchmarks
 What patterns of usage are likely to occur?

 What are the 80% cases?

 Are there 10% cases that have unusual patterns of
data access, or unusual input?
 Can you construct a plausible worst-case?
 Parameterize benchmarks
 Parameterize problem size
 Vary order-of-magnitude
 Create benchmarks based on real cases
 Use real problems for full benchmarking
 Miniaturize real problems for quick tests


Common Benchmarks

 Artificial benchmarks

 Simple “unit” benchmarks
 Real application + random data
 Real application + real data
 Real application + worst case data
 Mix of applications


Timing
 Why measure timings

 Direct or indirect measure of performance
 Establish baseline for changes

 Empirical measure of scaling
 Limitations
 Timers are often imprecise for brief events
 Other activity on the system  “noise”
 Many tools aggregate all phases of execution
 Many tools aggregate all areas of resource use
 CPU timings may exclude system resource use
 Must use condor_submit to run these on non-
interactive nodes
 Heisenbugs


Alternative: Queuing Models
 Formalists alternative to benchmarks
 Can be useful for capacity planning
 Model services as network of queues
 Different classes of “customers”
 Resources with different delay characteristics
 Transition probabilities
 Distribution of “service events”
 Poisson events  discrete, independent, no
memory
 Number of events are Poisson distributed
interarrival time exponentially distributed
 Little‟s law:
 Length of queue = arrival rate * time in queue Source: Takefusa, et al. 1999
 Limitations
 Heroic assumptions are often required
 State-space explosion
 Only simplest models solvable closed-form


Wall-Clock Time
 Measure completion time

 Show phases of execution by inserting calls

Linux: date
OS X: date
Windows: DATE
R: Sys.time()
Stata: display "$S_TIME $S_DATE”
Matlab: clock; tic
C: time(), getitimer()

> print(Sys.time())

[1] "2010-04-28 10:21:45 EDT"
> res <- optim(sq, distance, genseq, method="SANN",
+ control = list(maxit=30000, temp=2000))
> print(Sys.time())
[1] "2010-04-28 10:21:55 EDT"


CPU Time
 Measure CPU time used by program

 Show “system”-state and “user”-state time

 Some tools show other resources
Linux: /usr/bin/time –v
OS X: /usr/bin/time –l
Windows: timeit.exe*
R: system.time()
Stata: timer
Matlab: cputime
C: getrusage()

$ /usr/bin/time -v
/usr/local/stata11/stata -b mycommand.do

User time (seconds): 0.00
System time (seconds): 0.01
Percent of CPU this job got: 64%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.03
...

*Optional tool, may require installation on your system


Interpreting CPU Time
User time (seconds): 0.00

System time (seconds): 0.01
Percent of CPU this job got: 64%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.03

 If (system)/(system + user) > .1
 Possibly inefficient use of system calls, I/O
 If elapsed time >> (system+user)
 Possible resource bottleneck
 Possible sleep
 If CPU Percent low  possible CPU contention


Monitoring Running Processes
 Show list of processes running

 See current and accumulated CPU usage

 See CPU utilization
Linux: top; gnome-system-monitor
OS X: top; Utilities -> “Activity Monitor”;
atMonitor (3rd party, highly recommended)
Windows: taskmrg.exe; top.exe *

$ gnome-system-monitor &

*Optional tool, may require installation on your system


Interpreting Process Monitor Results
Show processes

Sort # of processes

waiting to use CPU

Sort processes
By CPU use

 Show list of processes running
 See current and accumulated CPU & memory usage
 See CPU utilization

Sample Performance Curves

• Best case: linear in size of problem
• Nonlinearities could mean…
• inefficient algorithm (case 2)
• hard problem (case 3)
• poor data access patterns (case 4)


System Resource Monitoring
 Why monitor system resources?

 Identify bottlenecks
 Identify processes using resources – may affect overall

throughput and capacity
 Identify processes actively using resources – may affect
performance
 Limitations
 Tools are often imprecise for brief events
 Other activity on the system  “noise”
 Many tools aggregate all phases of execution
 Many tools aggregate all system use
 Many tools aggregate sub-resource use
 Must use condor_submit to run these on non-interactive nodes
 Heisenbugs


Monitoring System Resources
 See system aggregated use and activity for memory,

disk, network
 See memory use by process

 See resource use by process (varies by platform)

Linux: gnome-system-monitor ; /usr/bin/time –v
sar ; iostat ; vmstat
OS X: Utilities -> “Activity Monitor”
/usr/bin/time –v; sar ; iostat
Windows: perfmon.exe; taskmrg.exe

$ sar –A 1 10
$ /usr/bin/time –v stata –b somefile.do


Detailed System Resource Tracing
 See system use/calls for process as it runs

Linux: strace; systap (add-on)
OS X: dtrace
Windows: procmon.exe (add-on)

$ strace –o strace.log myProgram

$ sudo dtrace -n 'syscall:::entry { @[execname] = count() }' -c ls


Interpreting Process Memory Use
Use monitor->preferences to

$ gnome-system-monitor & Add “Resident Memory”

Memory in residence

Requested
memory

 Memory – amount of virtual memory requested
 Resident Memory – amount of memory currently
in RAM for process

Interpreting System Activity

$ sar -bB 1 10
System memory activity

01:37:57 PM pgpgin/s pgpgout/s fault/s majflt/s
01:37:58 PM 0.00 0.00 14.71 0.00

01:37:57 PM tps rtps wtps bread/s bwrtn/s
01:37:58 PM 0.00 0.00 0.00 0.00 0.00

System disk activity

 Page faults – indicate memory activity or
resource contention
 File i/o – indicates file activity


Interpreting System Activity

$ perfmon

 Page faults – indicate memory activity or
resource contention


Interpreting Process Resource Use

$ /usr/bin/time –v stata –b command
Often memory related

Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 149
Voluntary context switches: 1280
Involuntary context switches: 460
Swaps: 0
File system inputs: 0
File system outputs: 0 Process disk I/O

 Page faults – indicate memory activity or resource contention
 Voluntary context switches – indicates waiting on I/O or memory
 Swaps – indicates a severe system memory shortage
 If the numbers is always 0 – it‟s a lie 


Symptoms of CPU Bound
System/Problem

 CPU User+Sys activity near 100% while there are
active processes (if # of procs > # of cpus)
 Performance curve for your problem is continuous
 This is usually good 
 CPU is most expensive resource
 You can trust code profiling reports
 More likely to have gains from parallelization
 However, if CPU %sys is high suspect inefficient use
of system calls, or borderline I.o or memory
bottlenecks


Symptoms of Resource Bottleneck
 Memory Bottlenecks:
Severe:


 processes in swap queue (or wait on swap)
 lots of space in use (see swap –m), swapping activity, free memory low

 Moderate:
 high context switches +
 high page (validity) faults +
 active processes with memory >> resident memory
 I/O Bottlenecks:
 Moderate:
 high % sys activity in CPU, high # of system calls, # interrupts
 Severe
 I/O rate high
 Context switches, wait on I/O, or processes sleeping on I/O
 Physical disk activity high
 Performance curve
 Discontinuous regions of accelerated performance decline


Tune against bottlenecks
 Typically, a single resource will be
High Performance Computing: Scaling Up

the bottleneck point:
 CPU 
 Memory
 I/O: Graphics, Network, Disk
 If you don‟t address the
bottleneck, optimizations
elsewhere won‟t matter
 Bottlenecks may depend on usage
scenario and phase of operation
 Fixing one bottleneck may reveal
others
 Don’t expect speedup of the entire
program to be proportional to the
code you just tuned!
 Programs interact, try to profile on
a quiet system first


Resource Analysis: Checklist

 Identify benchmarks

 Small instances of your problem
 Can vary size
 Target an isolated system
 Minimize other activity
 Time benchmarks at various sizes
 Monitor systems resources
 Look for non-linearities in performance curve
 Look for bottlenecks

Lab: Analyzing Resource Use
In this lab, we will


login to the RCE and

run a simple set of
benchmarks
 Use timing tools and
performance
analysis
 Identify bottlenecks [Source: http://andreymath.wikidot.com/
Licensnce]

and performance
curves

Scaling Up
 Addressing resource bottlenecks

 System and application limits
 Storing/accessing large datasets
Scaling Up

 Visualizing large datasets


When to Scale Up
 If resource analysis identifies a memory

bottleneck
 If resource analysis identifies an I/O bottleneck
(maybe …)
If problem size prevents program from starting
Scaling Up


 If program crashes or hangs in the middle of
solving large problems (maybe…)
 If planning ahead for significant usage changes:
- size of problem data >
~1/2 available physical memory (RAM)
- change of algorithm

- change of data structure


Addressing Memory Bottlenecks
 Review: Symptoms of memory bottleneck

 Discontinuity in performance curve
 Memory size of process increasing
 Resident memory size of process relatively large
 System activity shows memory activity
Scaling Up

 Principals of addressing memory bottlenecks
 Memory hierarchy
 Locality of reference
 Programming patterns
 Add more resources
 Modify data types
 Modify data structures
 Modify algorithms


Memory Hierarchy

Registers
(<1KB)
Scaling Up

Cache (1 MB)

Ram (10 Gigabytes)

Local Storage (10‟s Terabytes)

ONLINE Storage (100‟s Petabytes)

OFFLINE STORAGE ( 10‟s Exabytes )

If a register access took a Buy one, get 8092 free!
second, tape access would
take a few centuries..


Locality of reference
 Temporal locality: reuse same data elements

 Spatial locality: use elements that are “near” each-
other in memory
 What is “near”?
Scaling Up

 For vectors and files: sequential ordering
 For matrices: either row or column ordering
depends on language
 For complex data structures:
use experimentation and analysis

Row-Major Order


Adding More Resources
“$$$” Optimization



 Buy more memory, or…
use the RCE to request a larger share
Scaling Up

 This is effective if local set size < share size


System and Application Resource Limits
 Limits imposed by system or application

 Virtual Memory
 Logical memory space for process
 Virtual memory limits maximum size of memory requrested
Scaling Up

 Can prevent program from starting, or loading large data
 Physical Memory
 Physical RAM installed in system
 Usually smaller than VM, but not always
 Maximum efficient local set
 Resident Size Limits
 Affects maximum efficient local set
- not as severely as physical limits


Limits in Linux and OS X
 Where limits are set:

 Set at bootup
 Set by system at login – group/user level total memory limits
 Set in shell at process creation – request new limit (up to user
maximum)
 Set in code via setrlimit
Scaling Up

 Set in application
 Know your limits
 Linux/OS X: /usr/bin/ulimit –a
 R: none for Linux
 Stata: query memory
 Limits on 32 v. 64 bit systems
 32 bit OS has limit of 4GB for virtual & physical memory
 64 bit OS
 No practical limit on virtual memory
 Physical memory still limited by hardware configuration and design
 Data structures may require more memory to store, since pointers
and default data types are larger


Limits in windows systems
 Where limits are set:

 Limits implied by configuration at boot
 Virtual memory typically depends on configured paging space on disk + pagefile
 R: memory.limit()
 Limits on 32 v. 64 bit systems
 Most 32 bit windows OS has limit of 3GB physical memory
Scaling Up

 32 Bit addressing allows 4GB, but 1GB reserved for memory mapped hardware, so
only 3GB left over in most Windows configurations
 64 bit OS
 No practical limit on virtual memory (8 TB)
 Physical memory still limited by hardware configuration and design
 Data structures may require more memory to store, since pointers and default data
types are larger
 Some windows applications are 32-bit versions, so still limited to 4GB virtual
memory.

M. Altman & B. Kinney High Perf. Stat. Computing (v.9/10/11) 80 of 85

Basic Memory Management in Statistical
Software

Matlab R Stata

Memory Limit --- memory.size() set memory
[windows only]
Scaling Up

Remove objects CLEAR rm() clear

Shrink data --- as.integer(real_ compress
types val)
as.factor(string_
val)
Measure data object.size() memory
size gc()
Order for virtual PACK gc() set virtual
memory


High Performance Statistical Computing

High Performance Statistical Computing

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a High Performance Statistical Computing

Semelhante a High Performance Statistical Computing (20)

Mais de Micah Altman

Mais de Micah Altman (20)

Último

Último (20)

High Performance Statistical Computing