SlideShare uma empresa Scribd logo
1 de 53
Baixar para ler offline
ICME MapReduce Workshop!
April 29 – May 1, 2013!
!

David F. Gleich!
Computer Science!
Purdue University
David Gleich · Purdue 
1


!
Website www.stanford.edu/~paulcon/icme-mapreduce-2013
Paul G. Constantine!
Center for Turbulence Research!
Stanford University
MRWorkshop
Goals
Learn the basics of MapReduce & Hadoop
Be able to process large volumes of data from
science and engineering applications 
… help enable you to explore on your own!
David Gleich · Purdue 
2
MRWorkshop
Workshop overview
Monday!
Me! Sparse matrix computations in MapReduce!
Austin Benson Tall-and-skinny matrix computations in MapReduce
Tuesday!
Joe Buck Extending MapReduce for scientific computing!
Chunsheng Feng Large scale video analytics on pivotal Hadoop
Wednesday!
Joe Nichols Post-processing CFD dynamics data in MapReduce !
Lavanya Ramakrishnan Evaluating MapReduce and Hadoop for science
David Gleich · Purdue 
3
MRWorkshop
Sparse matrix computations
in MapReduce!

David F. Gleich!
Computer Science!
Purdue University
David Gleich · Purdue 
4


Slides online soon!
Code https://github.com/dgleich/mapreduce-matrix-tutorial
MRWorkshop
How to compute with big matrix data !
A tale of two computers	

224k Cores
10 PB drive
1.7 Pflops

7 MW

Custom !
interconnect!

$104 M


80k cores!
50 PB drive
? Pflops

? MW

GB ethernet

$?? M
625 GB/core!
High disk to CPU
45 GB/core
High CPU to disk
5
ORNL 2010 Supercomputer!
Google’s 2010? !
Data computer!
David Gleich · Purdue 
 MRWorkshop
My data computers 
6

Nebula Cluster @ Sandia CA!
2TB/core storage, 64 nodes,
256 cores, GB ethernet
Cost $150k

These systems are good for working with
enormous matrix data!

ICME Hadoop @ Stanford!
3TB/core storage, 11 nodes,
44 cores, GB ethernet
Cost $30k

David Gleich · Purdue 
 MRWorkshop
My data computers 
7

Nebula Cluster @ Sandia CA!
2TB/core storage, 64 nodes,
256 cores, GB ethernet
Cost $150k

These systems are good for working with
enormous matrix data!

ICME Hadoop @ Stanford!
3TB/core storage, 11 nodes,
44 cores, GB ethernet
Cost $30k

^
but not great,
David Gleich · Purdue 
some
^
MRWorkshop
By 2013(?) all Fortune 500
companies will have a data
computer
David Gleich · Purdue 
8
MRWorkshop
How do you program them?
9
David Gleich · Purdue 
 MRWorkshop
MapReduce and!
Hadoop overview
10
David Gleich · Purdue 
 MRWorkshop
MapReduce is designed to
solve a different set of problems
from standard parallel libraries
11
David Gleich · Purdue 
 MRWorkshop
The MapReduce
programming model
Input a list of (key, value) pairs
Map apply a function f to all pairs
Reduce apply a function g to !
all values with key k (for all k)
Output a list of (key, value) pairs



12
David Gleich · Purdue 
 MRWorkshop
Computing a histogram !
A simple MapReduce example
13
Input!
!
Key ImageId
Value Pixels 
Map(ImageId, Pixels)
for each pixel
emit"
Key = (r,g,b)"
Value = 1
Reduce(Color, Values)
emit"
Key = Color
Value = sum(Values)
Output!
!
Key Color
Value !
# of pixels 
David Gleich · Purdue 
5
15
10
9
3
17
5
10
1
1
1
1
Map
 Reduce
1
1
1
1
1
1
1
1
1
1
1
1
shuffle
MRWorkshop
Many matrix computations
are possible in MapReduce
Column sums are easy !
Input Key (i,j) Value Aij






Other basic methods !
can use common parallel/out-of-core algs!
Sparse matrix-vector products y = Ax
Sparse matrix-matrix products C = AB
14
Reduce(j,Values)
emit
Key = j, Value = sum(Values)
David Gleich · Purdue 
Map((i,j), val)
emit"
Key = j, Value = val
A11
 A12
 A13
 A14
A21
 A22
 A23
 A24
A31
 A32
 A33
 A34
A41
 A42
 A43
 A44
(3,4) -> 5
(1,2) -> -6.0
(2,3) -> -1.2
(1,1) -> 3.14
…
“Coordinate storage”
MRWorkshop
Many matrix computations
are possible in MapReduce
Column sums are easy !
Input Key (i,j) Value Aij






Other basic methods !
can use common parallel/out-of-core algs!
Sparse matrix-vector products y = Ax
Sparse matrix-matrix products C = AB
15
Reduce(j,Values)
emit
Key = j, Value = sum(Values)
David Gleich · Purdue 
Map((i,j), val)
emit"
Key = j, Value = val
A11
 A12
 A13
 A14
A21
 A22
 A23
 A24
A31
 A32
 A33
 A34
A41
 A42
 A43
 A44
(3,4) -> 5
(1,2) -> -6.0
(2,3) -> -1.2
(1,1) -> 3.14
…
“Coordinate storage”
Beware of un-thoughtful ideas
MRWorkshop
Why so many limitations?
16
David Gleich · Purdue 
 MRWorkshop
The MapReduce
programming model
Input a list of (key, value) pairs
Map apply a function f to all pairs
Reduce apply a function g to !
all values with key k (for all k)
Output a list of (key, value) pairs
Map function f must be side-effect free!
All map functions run in parallel
Reduce function g must be side-effect free!
All reduce functions run in parallel

17
David Gleich · Purdue 
 MRWorkshop
A graphical view of the MapReduce
programming model
David Gleich · Purdue 
18
data
Map
data
Map
data
Map
data
Map
key
value
key
value
key
value
key
value
key
value
key
value
()
Shuffle
key
value
value
dataReduce
key
value
value
value
dataReduce
key
value dataReduce
MRWorkshop
Data scalability
The idea !
Bring the computations to the data
MR can schedule map functions without
moving data.
1
 M
M
R
R
M
M
M
Maps
Reduce
Shuffle
2
3
4
5
1
 2
M M
3
 4
M M
5
M
19
David Gleich · Purdue 
 MRWorkshop
After waiting in the queue for a month and !
after 24 hours of finding eigenvalues, one node randomly hiccups. 
heartbreak on node rs252
David Gleich · Purdue 
20
MRWorkshop
Fault tolerant
Redundant input helps make maps data-local
Just one type of communication: shuffle
M
M
R
R
M
M
Input stored in triplicate
Map output!
persisted to disk!
before shuffle
Reduce input/!
output on disk
David Gleich · Purdue 
21
MRWorkshop
Fault injection
10
 100
 1000
1/Prob(failure) – mean number of success per failure
Timetocompletion(sec)
200
100
No faults (200M by 200)
Faults (800M by 10)
Faults (200M by 200)
No faults !
(800M by 10)
With 1/5
tasks failing,
the job only
takes twice
as long.
David Gleich · Purdue 
22
MRWorkshop
Data scalability
The idea !
Bring the computations to the data
MR can schedule map functions without
moving data.
1
 M
M
R
R
M
M
M
Maps
Reduce
Shuffle
2
3
4
5
1
 2
M M
3
 4
M M
5
M
23
David Gleich · Purdue 
 MRWorkshop
Computing a histogram !
A simple MapReduce example
24
Input!
!
Key ImageId
Value Pixels 
Map(ImageId, Pixels)
for each pixel
emit"
Key = (r,g,b)"
Value = 1
Reduce(Color, Values)
emit"
Key = Color
Value = sum(Values)
Output!
!
Key Color
Value !
# of pixels 
David Gleich · Purdue 
5
15
10
9
3
17
5
10
1
1
1
1
Map
 Reduce
1
1
1
1
1
1
1
1
1
1
1
1
shuffle
The entire dataset is
“transposed” from
images to pixels.	

This moves the data
to the computation!	

(Using a combiner
helps to reduce the
data moved, but it
cannot always be
used)	

MRWorkshop
Hadoop and MapReduce are
bad systems for some matrix
computations.
David Gleich · Purdue 
25
MRWorkshop
How should you evaluate a
MapReduce algorithm?
Build a performance model!

Measure the worst mapper 
Usually not too bad
Measure the data moved 
Could be very bad
Measure the worst reducer 
Could be very bad
David Gleich · Purdue 
26
MRWorkshop
Tools I like
hadoop streaming
dumbo
mrjob
hadoopy
C++
David Gleich · Purdue 
27
MRWorkshop
Tools I don’t use but other
people seem to like …
pig
java
hbase
mahout
Eclipse
Cassandra

David Gleich · Purdue 
28
MRWorkshop
hadoop streaming
the map function is a program!
(key,value) pairs are sent via stdin!
output (key,value) pairs goes to stdout

the reduce function is a program!
(key,value) pairs are sent via stdin!
keys are grouped!
output (key,value) pairs goes to stdout
David Gleich · Purdue 
29
MRWorkshop
mrjob from 
a wrapper around hadoop streaming for
map and reduce functions in python
class MRWordFreqCount(MRJob):
def mapper(self, _, line):
for word in line.split():
yield (word.lower(), 1)
def reducer(self, word, counts):
yield (word, sum(counts))
if __name__ == '__main__':
MRWordFreqCount.run()
David Gleich · Purdue 
30
MRWorkshop
How can Hadoop streaming
possibly be fast?
Iter 1
QR (secs.)
Iter 1
Total (secs.)
Iter 2
Total (secs.)
Overall
Total (secs.)
Dumbo 67725 960 217 1177
Hadoopy 70909 612 118 730
C++ 15809 350 37 387
Java 436 66 502
Synthetic data test 100,000,000-by-500 matrix (~500GB)
Codes implemented in MapReduce streaming
Matrix stored as TypedBytes lists of doubles
Python frameworks use Numpy+Atlas
Custom C++ TypedBytes reader/writer with Atlas
New non-streaming Java implementation too
David Gleich (Sandia)
All timing results from the Hadoop job tracker
C++ in streaming beats a native Java implementation.
16/22MapReduce 2011
David Gleich · Purdue 
31
Example available from 
github.com/dgleich/mrtsqr!
for verification
mrjob could be faster if it used
typedbytes for intermediate storage see
https://github.com/Yelp/mrjob/pull/447
MRWorkshop
Code samples and short tutorials at
github.com/dgleich/mrmatrix
github.com/dgleich/mapreduce-matrix-tutorial
David Gleich · Purdue 
32
MRWorkshop
Matrix-vector product
David Gleich · Purdue 
33
Ax = y
yi =
X
k
Aik xk
A
x
Follow along! 
mapreduce-matrix-tutorial!
/codes/smatvec.py!
MRWorkshop
Matrix-vector product
David Gleich · Purdue 
34
Ax = y
yi =
X
k
Aik xk
A
x
A is stored by row

$ head samples/smat_5_5.txt !
0 0 0.125 3 1.024 4 0.121!
1 0 0.597!
2 2 1.247!
3 4 -1.45!
4 2 0.061!

x is stored entry-wise
!
$ head samples/vec_5.txt!
0 0.241!
1 -0.98!
2 0.237!
3 -0.32!
4 0.080!
Follow along! 
mapreduce-matrix-tutorial!
/codes/smatvec.py!
MRWorkshop
Matrix-vector product!
(in pictures)
David Gleich · Purdue 
35
Ax = y
yi =
X
k
Aik xk
A
x
A
x
Input
 Map 1!
Align on columns!

Reduce 1!
Output Aik xk!
keyed on row i
A
x
Reduce 2!
Output 
sum(Aik xk)!

y
MRWorkshop
Matrix-vector product!
(in pictures)
David Gleich · Purdue 
36
Ax = y
yi =
X
k
Aik xk
A
x
A
x
Input
 Map 1!
Align on columns!

def joinmap(self, key, line):!
vals = line.split()!
if len(vals) == 2:!
# the vector!
yield (vals[0], # row!
(float(vals[1]),)) # xi!
else:!
# the matrix!
row = vals[0]!
for i in xrange(1,len(vals),2):!
yield (vals[i], # column!
(row, # i,Aij!
float(vals[i+1])))!
MRWorkshop
Matrix-vector product!
(in pictures)
David Gleich · Purdue 
37
Ax = y
yi =
X
k
Aik xk
A
x
A
x
Input
 Map 1!
Align on columns!

Reduce 1!
Output Aik xk!
keyed on row i
A
x
def joinred(self, key, vals):!
vecval = 0. !
matvals = []!
for val in vals:!
if len(val) == 1:!
vecval += val[0]!
else:!
matvals.append(val) !
for val in matvals:!
yield (val[0], val[1]*vecval)!
Note that you should use a
secondary sort to avoid
reading both in memory	

MRWorkshop
Matrix-vector product!
(in pictures)
David Gleich · Purdue 
38
Ax = y
yi =
X
k
Aik xk
A
x
A
x
Input
 Map 1!
Align on columns!

Reduce 1!
Output Aik xk!
keyed on row i
A
x
Reduce 2!
Output 
sum(Aik xk)!

y
def sumred(self, key, vals):!
yield (key, sum(vals))!
MRWorkshop
Move the computations to the
data? Not really!
David Gleich · Purdue 
39
A
x
A
x
Input
 Map 1!
Align on columns!

Reduce 1!
Output Aik xk!
keyed on row i
A
x
Reduce 2!
Output 
sum(Aik xk)!

y
Copy data once, 
now aligned on column	

Copy data again,
align on row 	

MRWorkshop
Matrix-matrix product
David Gleich · Purdue 
40
A
B
AB = C
Cij =
X
k
Aik Bkj
Follow along! 
mapreduce-matrix-tutorial!
/codes/matmat.py!
MRWorkshop
Matrix-matrix product
David Gleich · Purdue 
41
A
B
AB = C
Cij =
X
k
Aik Bkj
A is stored by row

$ head samples/smat_10_5_A.txt !
0 0 0.599 4 -1.53!
1!
2 2 0.260!
3!
4 0 0.267 1 0.839 
B is stored by row

$ head samples/smat_5_5.txt !
0 0 0.125 3 1.024 4 0.121!
1 0 0.597!
2 2 1.247!

Follow along! 
mapreduce-matrix-tutorial!
/codes/matmat.py!
MRWorkshop
Matrix-matrix product !
(in pictures)
David Gleich · Purdue 
42
A
B
AB = C
Cij =
X
k
Aik Bkj
A
Map 1!
Align on columns!

B
Reduce 1!
Output Aik Bkj!
keyed on (i,j)
A
B
 Reduce 2!
Output 
sum(Aik Bkj)!

C
MRWorkshop
Matrix-matrix product !
(in code)
David Gleich · Purdue 
43
A
B
AB = C
Cij =
X
k
Aik Bkj
A
Map 1!
Align on columns!

B
def joinmap(self, key, line):!
mtype = self.parsemat()!
vals = line.split()!
row = vals[0]!
rowvals =  !
[(vals[i],float(vals[i+1])) !
for i in xrange(1,len(vals),2)]!
if mtype==1:!
# matrix A, output by col!
for val in rowvals:!
yield (val[0], (row, val[1]))!
else:!
yield (row, (rowvals,))!
MRWorkshop
Matrix-matrix product !
(in code)
David Gleich · Purdue 
44
A
B
AB = C
Cij =
X
k
Aik Bkj
A
Map 1!
Align on columns!

B
Reduce 1!
Output Aik Bkj!
keyed on (i,j)
A
B
def joinred(self, key, line):!
# load the data into memory !
brow = []!
acol = []!
for val in vals:!
if len(val) == 1:!
brow.extend(val[0])!
else:!
acol.append(val)!
!
for (bcol,bval) in brow:!
for (arow,aval) in acol:!
yield ((arow,bcol),aval*bval)!
MRWorkshop
Matrix-matrix product !
(in pictures)
David Gleich · Purdue 
45
A
B
AB = C
Cij =
X
k
Aik Bkj
A
Map 1!
Align on columns!

B
Reduce 1!
Output Aik Bkj!
keyed on (i,j)
A
B
 Reduce 2!
Output 
sum(Aik Bkj)!

C
def sumred(self, key, vals):!
yield (key, sum(vals))!
MRWorkshop
Why is MapReduce so popular?
if (root) {!
PetscInt cur_nz=0;!
unsigned char* root_nz_buf;!
unsigned int *root_nz_buf_i,*root_nz_buf_j;!
double *root_nz_buf_v;!
PetscMalloc((sizeof(unsigned
int)*2+sizeof(double))*root_nz_bufsize,root_nz_buf);!
PetscMalloc(sizeof(unsigned
int)*root_nz_bufsize,root_nz_buf_i);!
PetscMalloc(sizeof(unsigned
int)*root_nz_bufsize,root_nz_buf_j);!
PetscMalloc(sizeof(double)*root_nz_bufsize,root_nz_buf_v);!
!
unsigned long long int nzs_to_read = total_nz;!
!
while (send_rounds  0) {!
// check if we are near the end of the file!
// and just read that amount!
size_t cur_nz_read = root_nz_bufsize;!
if (cur_nz_read  nzs_to_read) {!
cur_nz_read = nzs_to_read;!
}!
PetscInfo2(PETSC_NULL, reading %i non-zeros of %llin,
cur_nz_read, nzs_to_read);!
600 lines of gross
code in order to
load a sparse matrix
into memory,
streaming from one
processor.

MapReduce offers a
better alternative
David Gleich · Purdue 
46
MRWorkshop
Thoughts on a better system
Default quadruple precision
Matrix computations without indexing
Easy setup of MPI data jobs
David Gleich · Purdue 
47



































 


Initial data load of any MPI job
 Compute task
MRWorkshop
Double-precision floating point
was designed for the era
where “big” was 1000-10000
David Gleich · Purdue 
48
MRWorkshop
Error analysis of summation
s = 0; for i=1 to n: s = s + x[i]




A simple summation formula has !
error that is not always small if n is a billion
David Gleich · Purdue 
49
fl(x + y) = (x + y)(1 + )
fl(
X
i
xi )
X
i
xi  nµ
X
i
|xi | µ ⇡ 10 16
MRWorkshop
If your application matters
then watch out for this issue.

Use quad-precision arithmetic
or compensated summation
instead.
David Gleich · Purdue 
50
MRWorkshop
Compensated Summation
“Kahan summation algorithm” on Wikipedia

s = 0.; c = 0.;
for i=1 to n: 
y = x[i] – c 
t = s + y
c = (t – s) – y 
s = t
David Gleich · Purdue 
51
Mathematically, c is always zero.

On a computer, c can be non-zero

The parentheses matter!



fl(csum(x))
X
i
xi  (µ + nµ2
)
X
i
|xi |
µ ⇡ 10 16
MRWorkshop
Summary
MapReduce is a powerful but limited tool that has a role
in the future of computational math.
… but it should be used carefully! See Austin’s talk next!




David Gleich · Purdue 
52
MRWorkshop
Code samples and short tutorials at
github.com/dgleich/mrmatrix
github.com/dgleich/mapreduce-matrix-tutorial
David Gleich · Purdue 
53
MRWorkshop

Mais conteúdo relacionado

Mais procurados

Multi-Task Learning for NLP
Multi-Task Learning for NLPMulti-Task Learning for NLP
Multi-Task Learning for NLPMotoki Sato
 
Case study on machine learning
Case study on machine learningCase study on machine learning
Case study on machine learningHarshitBarde
 
Detecting fraud with Python and machine learning
Detecting fraud with Python and machine learningDetecting fraud with Python and machine learning
Detecting fraud with Python and machine learningwgyn
 
String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)Aditya pratap Singh
 
BackTracking Algorithm: Technique and Examples
BackTracking Algorithm: Technique and ExamplesBackTracking Algorithm: Technique and Examples
BackTracking Algorithm: Technique and ExamplesFahim Ferdous
 
Divide and Conquer - Part 1
Divide and Conquer - Part 1Divide and Conquer - Part 1
Divide and Conquer - Part 1Amrinder Arora
 
String Matching Algorithms-The Naive Algorithm
String Matching Algorithms-The Naive AlgorithmString Matching Algorithms-The Naive Algorithm
String Matching Algorithms-The Naive AlgorithmAdeel Rasheed
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksMark Scully
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelHemantha Kulathilake
 
Naive String Matching Algorithm | Computer Science
Naive String Matching Algorithm | Computer ScienceNaive String Matching Algorithm | Computer Science
Naive String Matching Algorithm | Computer ScienceTransweb Global Inc
 
Knapsack problem algorithm, greedy algorithm
Knapsack problem algorithm, greedy algorithmKnapsack problem algorithm, greedy algorithm
Knapsack problem algorithm, greedy algorithmHoneyChintal
 

Mais procurados (20)

String matching algorithm
String matching algorithmString matching algorithm
String matching algorithm
 
DAA 18CS42 VTU CSE
DAA 18CS42 VTU CSEDAA 18CS42 VTU CSE
DAA 18CS42 VTU CSE
 
Multi-Task Learning for NLP
Multi-Task Learning for NLPMulti-Task Learning for NLP
Multi-Task Learning for NLP
 
Case study on machine learning
Case study on machine learningCase study on machine learning
Case study on machine learning
 
Detecting fraud with Python and machine learning
Detecting fraud with Python and machine learningDetecting fraud with Python and machine learning
Detecting fraud with Python and machine learning
 
Kmp
KmpKmp
Kmp
 
String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)
 
BackTracking Algorithm: Technique and Examples
BackTracking Algorithm: Technique and ExamplesBackTracking Algorithm: Technique and Examples
BackTracking Algorithm: Technique and Examples
 
Divide and Conquer - Part 1
Divide and Conquer - Part 1Divide and Conquer - Part 1
Divide and Conquer - Part 1
 
Optimal binary search tree dynamic programming
Optimal binary search tree   dynamic programmingOptimal binary search tree   dynamic programming
Optimal binary search tree dynamic programming
 
13 Amortized Analysis
13 Amortized Analysis13 Amortized Analysis
13 Amortized Analysis
 
String Matching Algorithms-The Naive Algorithm
String Matching Algorithms-The Naive AlgorithmString Matching Algorithms-The Naive Algorithm
String Matching Algorithms-The Naive Algorithm
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
Daa unit 3
Daa unit 3Daa unit 3
Daa unit 3
 
Ai notes
Ai notesAi notes
Ai notes
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural Networks
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
Naive String Matching Algorithm | Computer Science
Naive String Matching Algorithm | Computer ScienceNaive String Matching Algorithm | Computer Science
Naive String Matching Algorithm | Computer Science
 
Knapsack problem algorithm, greedy algorithm
Knapsack problem algorithm, greedy algorithmKnapsack problem algorithm, greedy algorithm
Knapsack problem algorithm, greedy algorithm
 
Plsql lab mannual
Plsql lab mannualPlsql lab mannual
Plsql lab mannual
 

Destaque

Matrix methods for Hadoop
Matrix methods for HadoopMatrix methods for Hadoop
Matrix methods for HadoopDavid Gleich
 
How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...David Gleich
 
Sparse Matrix and Polynomial
Sparse Matrix and PolynomialSparse Matrix and Polynomial
Sparse Matrix and PolynomialAroosa Rajput
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networksDavid Gleich
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsDavid Gleich
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
 
SPARSE STORAGE RECOMMENDATION SYSTEM FOR SPARSE MATRIX VECTOR MULTIPLICATION ...
SPARSE STORAGE RECOMMENDATION SYSTEM FOR SPARSE MATRIX VECTOR MULTIPLICATION ...SPARSE STORAGE RECOMMENDATION SYSTEM FOR SPARSE MATRIX VECTOR MULTIPLICATION ...
SPARSE STORAGE RECOMMENDATION SYSTEM FOR SPARSE MATRIX VECTOR MULTIPLICATION ...IAEME Publication
 
Implementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduceImplementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduceFarzan Hajian
 
Face recognition and math
Face recognition and mathFace recognition and math
Face recognition and mathKejti Cela
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
LINEAR ALGEBRA BEHIND GOOGLE SEARCH
LINEAR ALGEBRA BEHIND GOOGLE SEARCHLINEAR ALGEBRA BEHIND GOOGLE SEARCH
LINEAR ALGEBRA BEHIND GOOGLE SEARCHDivyansh Verma
 
Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDavid Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignmentDavid Gleich
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceDavid Gleich
 
Iterative methods for network alignment
Iterative methods for network alignmentIterative methods for network alignment
Iterative methods for network alignmentDavid Gleich
 
A history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveA history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveDavid Gleich
 

Destaque (20)

Matrix methods for Hadoop
Matrix methods for HadoopMatrix methods for Hadoop
Matrix methods for Hadoop
 
How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...
 
What is sparse matrix
What is sparse matrixWhat is sparse matrix
What is sparse matrix
 
Sparse Matrix and Polynomial
Sparse Matrix and PolynomialSparse Matrix and Polynomial
Sparse Matrix and Polynomial
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
SPARSE STORAGE RECOMMENDATION SYSTEM FOR SPARSE MATRIX VECTOR MULTIPLICATION ...
SPARSE STORAGE RECOMMENDATION SYSTEM FOR SPARSE MATRIX VECTOR MULTIPLICATION ...SPARSE STORAGE RECOMMENDATION SYSTEM FOR SPARSE MATRIX VECTOR MULTIPLICATION ...
SPARSE STORAGE RECOMMENDATION SYSTEM FOR SPARSE MATRIX VECTOR MULTIPLICATION ...
 
Implementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduceImplementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduce
 
Face recognition and math
Face recognition and mathFace recognition and math
Face recognition and math
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Sparse matrices
Sparse matricesSparse matrices
Sparse matrices
 
LINEAR ALGEBRA BEHIND GOOGLE SEARCH
LINEAR ALGEBRA BEHIND GOOGLE SEARCHLINEAR ALGEBRA BEHIND GOOGLE SEARCH
LINEAR ALGEBRA BEHIND GOOGLE SEARCH
 
HADOOP + R
HADOOP + RHADOOP + R
HADOOP + R
 
Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architectures
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignment
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
 
Iterative methods for network alignment
Iterative methods for network alignmentIterative methods for network alignment
Iterative methods for network alignment
 
A history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveA history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspective
 

Semelhante a Sparse matrix computations in MapReduce

MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisDavid Gleich
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraSomnath Mazumdar
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine ParallelismSri Prasanna
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopApache Apex
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansattilacsordas
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesKelly Technologies
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreKelly Technologies
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsDilum Bandara
 
Scala Meetup Hamburg - Spark
Scala Meetup Hamburg - SparkScala Meetup Hamburg - Spark
Scala Meetup Hamburg - SparkIvan Morozov
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and RRadek Maciaszek
 
Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on HadoopVivian S. Zhang
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Intel® Software
 
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabMapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 

Semelhante a Sparse matrix computations in MapReduce (20)

MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysis
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticians
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangalore
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel Problems
 
Scala Meetup Hamburg - Spark
Scala Meetup Hamburg - SparkScala Meetup Hamburg - Spark
Scala Meetup Hamburg - Spark
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Map and Reduce
Map and ReduceMap and Reduce
Map and Reduce
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and R
 
Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on Hadoop
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabMapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 

Mais de David Gleich

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksDavid Gleich
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresDavid Gleich
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisDavid Gleich
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansDavid Gleich
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph miningDavid Gleich
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresDavid Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential David Gleich
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detectionDavid Gleich
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...David Gleich
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationDavid Gleich
 
The power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsThe power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsDavid Gleich
 

Mais de David Gleich (20)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportation
 
The power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsThe power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulants
 

Último

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 

Último (20)

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 

Sparse matrix computations in MapReduce

  • 1. ICME MapReduce Workshop! April 29 – May 1, 2013! ! David F. Gleich! Computer Science! Purdue University David Gleich · Purdue 1 ! Website www.stanford.edu/~paulcon/icme-mapreduce-2013 Paul G. Constantine! Center for Turbulence Research! Stanford University MRWorkshop
  • 2. Goals Learn the basics of MapReduce & Hadoop Be able to process large volumes of data from science and engineering applications … help enable you to explore on your own! David Gleich · Purdue 2 MRWorkshop
  • 3. Workshop overview Monday! Me! Sparse matrix computations in MapReduce! Austin Benson Tall-and-skinny matrix computations in MapReduce Tuesday! Joe Buck Extending MapReduce for scientific computing! Chunsheng Feng Large scale video analytics on pivotal Hadoop Wednesday! Joe Nichols Post-processing CFD dynamics data in MapReduce ! Lavanya Ramakrishnan Evaluating MapReduce and Hadoop for science David Gleich · Purdue 3 MRWorkshop
  • 4. Sparse matrix computations in MapReduce! David F. Gleich! Computer Science! Purdue University David Gleich · Purdue 4 Slides online soon! Code https://github.com/dgleich/mapreduce-matrix-tutorial MRWorkshop
  • 5. How to compute with big matrix data ! A tale of two computers 224k Cores 10 PB drive 1.7 Pflops 7 MW Custom ! interconnect! $104 M 80k cores! 50 PB drive ? Pflops ? MW GB ethernet $?? M 625 GB/core! High disk to CPU 45 GB/core High CPU to disk 5 ORNL 2010 Supercomputer! Google’s 2010? ! Data computer! David Gleich · Purdue MRWorkshop
  • 6. My data computers 6 Nebula Cluster @ Sandia CA! 2TB/core storage, 64 nodes, 256 cores, GB ethernet Cost $150k These systems are good for working with enormous matrix data! ICME Hadoop @ Stanford! 3TB/core storage, 11 nodes, 44 cores, GB ethernet Cost $30k David Gleich · Purdue MRWorkshop
  • 7. My data computers 7 Nebula Cluster @ Sandia CA! 2TB/core storage, 64 nodes, 256 cores, GB ethernet Cost $150k These systems are good for working with enormous matrix data! ICME Hadoop @ Stanford! 3TB/core storage, 11 nodes, 44 cores, GB ethernet Cost $30k ^ but not great, David Gleich · Purdue some ^ MRWorkshop
  • 8. By 2013(?) all Fortune 500 companies will have a data computer David Gleich · Purdue 8 MRWorkshop
  • 9. How do you program them? 9 David Gleich · Purdue MRWorkshop
  • 10. MapReduce and! Hadoop overview 10 David Gleich · Purdue MRWorkshop
  • 11. MapReduce is designed to solve a different set of problems from standard parallel libraries 11 David Gleich · Purdue MRWorkshop
  • 12. The MapReduce programming model Input a list of (key, value) pairs Map apply a function f to all pairs Reduce apply a function g to ! all values with key k (for all k) Output a list of (key, value) pairs 12 David Gleich · Purdue MRWorkshop
  • 13. Computing a histogram ! A simple MapReduce example 13 Input! ! Key ImageId Value Pixels Map(ImageId, Pixels) for each pixel emit" Key = (r,g,b)" Value = 1 Reduce(Color, Values) emit" Key = Color Value = sum(Values) Output! ! Key Color Value ! # of pixels David Gleich · Purdue 5 15 10 9 3 17 5 10 1 1 1 1 Map Reduce 1 1 1 1 1 1 1 1 1 1 1 1 shuffle MRWorkshop
  • 14. Many matrix computations are possible in MapReduce Column sums are easy ! Input Key (i,j) Value Aij Other basic methods ! can use common parallel/out-of-core algs! Sparse matrix-vector products y = Ax Sparse matrix-matrix products C = AB 14 Reduce(j,Values) emit Key = j, Value = sum(Values) David Gleich · Purdue Map((i,j), val) emit" Key = j, Value = val A11 A12 A13 A14 A21 A22 A23 A24 A31 A32 A33 A34 A41 A42 A43 A44 (3,4) -> 5 (1,2) -> -6.0 (2,3) -> -1.2 (1,1) -> 3.14 … “Coordinate storage” MRWorkshop
  • 15. Many matrix computations are possible in MapReduce Column sums are easy ! Input Key (i,j) Value Aij Other basic methods ! can use common parallel/out-of-core algs! Sparse matrix-vector products y = Ax Sparse matrix-matrix products C = AB 15 Reduce(j,Values) emit Key = j, Value = sum(Values) David Gleich · Purdue Map((i,j), val) emit" Key = j, Value = val A11 A12 A13 A14 A21 A22 A23 A24 A31 A32 A33 A34 A41 A42 A43 A44 (3,4) -> 5 (1,2) -> -6.0 (2,3) -> -1.2 (1,1) -> 3.14 … “Coordinate storage” Beware of un-thoughtful ideas MRWorkshop
  • 16. Why so many limitations? 16 David Gleich · Purdue MRWorkshop
  • 17. The MapReduce programming model Input a list of (key, value) pairs Map apply a function f to all pairs Reduce apply a function g to ! all values with key k (for all k) Output a list of (key, value) pairs Map function f must be side-effect free! All map functions run in parallel Reduce function g must be side-effect free! All reduce functions run in parallel 17 David Gleich · Purdue MRWorkshop
  • 18. A graphical view of the MapReduce programming model David Gleich · Purdue 18 data Map data Map data Map data Map key value key value key value key value key value key value () Shuffle key value value dataReduce key value value value dataReduce key value dataReduce MRWorkshop
  • 19. Data scalability The idea ! Bring the computations to the data MR can schedule map functions without moving data. 1 M M R R M M M Maps Reduce Shuffle 2 3 4 5 1 2 M M 3 4 M M 5 M 19 David Gleich · Purdue MRWorkshop
  • 20. After waiting in the queue for a month and ! after 24 hours of finding eigenvalues, one node randomly hiccups. heartbreak on node rs252 David Gleich · Purdue 20 MRWorkshop
  • 21. Fault tolerant Redundant input helps make maps data-local Just one type of communication: shuffle M M R R M M Input stored in triplicate Map output! persisted to disk! before shuffle Reduce input/! output on disk David Gleich · Purdue 21 MRWorkshop
  • 22. Fault injection 10 100 1000 1/Prob(failure) – mean number of success per failure Timetocompletion(sec) 200 100 No faults (200M by 200) Faults (800M by 10) Faults (200M by 200) No faults ! (800M by 10) With 1/5 tasks failing, the job only takes twice as long. David Gleich · Purdue 22 MRWorkshop
  • 23. Data scalability The idea ! Bring the computations to the data MR can schedule map functions without moving data. 1 M M R R M M M Maps Reduce Shuffle 2 3 4 5 1 2 M M 3 4 M M 5 M 23 David Gleich · Purdue MRWorkshop
  • 24. Computing a histogram ! A simple MapReduce example 24 Input! ! Key ImageId Value Pixels Map(ImageId, Pixels) for each pixel emit" Key = (r,g,b)" Value = 1 Reduce(Color, Values) emit" Key = Color Value = sum(Values) Output! ! Key Color Value ! # of pixels David Gleich · Purdue 5 15 10 9 3 17 5 10 1 1 1 1 Map Reduce 1 1 1 1 1 1 1 1 1 1 1 1 shuffle The entire dataset is “transposed” from images to pixels. This moves the data to the computation! (Using a combiner helps to reduce the data moved, but it cannot always be used) MRWorkshop
  • 25. Hadoop and MapReduce are bad systems for some matrix computations. David Gleich · Purdue 25 MRWorkshop
  • 26. How should you evaluate a MapReduce algorithm? Build a performance model! Measure the worst mapper Usually not too bad Measure the data moved Could be very bad Measure the worst reducer Could be very bad David Gleich · Purdue 26 MRWorkshop
  • 27. Tools I like hadoop streaming dumbo mrjob hadoopy C++ David Gleich · Purdue 27 MRWorkshop
  • 28. Tools I don’t use but other people seem to like … pig java hbase mahout Eclipse Cassandra David Gleich · Purdue 28 MRWorkshop
  • 29. hadoop streaming the map function is a program! (key,value) pairs are sent via stdin! output (key,value) pairs goes to stdout the reduce function is a program! (key,value) pairs are sent via stdin! keys are grouped! output (key,value) pairs goes to stdout David Gleich · Purdue 29 MRWorkshop
  • 30. mrjob from a wrapper around hadoop streaming for map and reduce functions in python class MRWordFreqCount(MRJob): def mapper(self, _, line): for word in line.split(): yield (word.lower(), 1) def reducer(self, word, counts): yield (word, sum(counts)) if __name__ == '__main__': MRWordFreqCount.run() David Gleich · Purdue 30 MRWorkshop
  • 31. How can Hadoop streaming possibly be fast? Iter 1 QR (secs.) Iter 1 Total (secs.) Iter 2 Total (secs.) Overall Total (secs.) Dumbo 67725 960 217 1177 Hadoopy 70909 612 118 730 C++ 15809 350 37 387 Java 436 66 502 Synthetic data test 100,000,000-by-500 matrix (~500GB) Codes implemented in MapReduce streaming Matrix stored as TypedBytes lists of doubles Python frameworks use Numpy+Atlas Custom C++ TypedBytes reader/writer with Atlas New non-streaming Java implementation too David Gleich (Sandia) All timing results from the Hadoop job tracker C++ in streaming beats a native Java implementation. 16/22MapReduce 2011 David Gleich · Purdue 31 Example available from github.com/dgleich/mrtsqr! for verification mrjob could be faster if it used typedbytes for intermediate storage see https://github.com/Yelp/mrjob/pull/447 MRWorkshop
  • 32. Code samples and short tutorials at github.com/dgleich/mrmatrix github.com/dgleich/mapreduce-matrix-tutorial David Gleich · Purdue 32 MRWorkshop
  • 33. Matrix-vector product David Gleich · Purdue 33 Ax = y yi = X k Aik xk A x Follow along! mapreduce-matrix-tutorial! /codes/smatvec.py! MRWorkshop
  • 34. Matrix-vector product David Gleich · Purdue 34 Ax = y yi = X k Aik xk A x A is stored by row $ head samples/smat_5_5.txt ! 0 0 0.125 3 1.024 4 0.121! 1 0 0.597! 2 2 1.247! 3 4 -1.45! 4 2 0.061! x is stored entry-wise ! $ head samples/vec_5.txt! 0 0.241! 1 -0.98! 2 0.237! 3 -0.32! 4 0.080! Follow along! mapreduce-matrix-tutorial! /codes/smatvec.py! MRWorkshop
  • 35. Matrix-vector product! (in pictures) David Gleich · Purdue 35 Ax = y yi = X k Aik xk A x A x Input Map 1! Align on columns! Reduce 1! Output Aik xk! keyed on row i A x Reduce 2! Output sum(Aik xk)! y MRWorkshop
  • 36. Matrix-vector product! (in pictures) David Gleich · Purdue 36 Ax = y yi = X k Aik xk A x A x Input Map 1! Align on columns! def joinmap(self, key, line):! vals = line.split()! if len(vals) == 2:! # the vector! yield (vals[0], # row! (float(vals[1]),)) # xi! else:! # the matrix! row = vals[0]! for i in xrange(1,len(vals),2):! yield (vals[i], # column! (row, # i,Aij! float(vals[i+1])))! MRWorkshop
  • 37. Matrix-vector product! (in pictures) David Gleich · Purdue 37 Ax = y yi = X k Aik xk A x A x Input Map 1! Align on columns! Reduce 1! Output Aik xk! keyed on row i A x def joinred(self, key, vals):! vecval = 0. ! matvals = []! for val in vals:! if len(val) == 1:! vecval += val[0]! else:! matvals.append(val) ! for val in matvals:! yield (val[0], val[1]*vecval)! Note that you should use a secondary sort to avoid reading both in memory MRWorkshop
  • 38. Matrix-vector product! (in pictures) David Gleich · Purdue 38 Ax = y yi = X k Aik xk A x A x Input Map 1! Align on columns! Reduce 1! Output Aik xk! keyed on row i A x Reduce 2! Output sum(Aik xk)! y def sumred(self, key, vals):! yield (key, sum(vals))! MRWorkshop
  • 39. Move the computations to the data? Not really! David Gleich · Purdue 39 A x A x Input Map 1! Align on columns! Reduce 1! Output Aik xk! keyed on row i A x Reduce 2! Output sum(Aik xk)! y Copy data once, now aligned on column Copy data again, align on row MRWorkshop
  • 40. Matrix-matrix product David Gleich · Purdue 40 A B AB = C Cij = X k Aik Bkj Follow along! mapreduce-matrix-tutorial! /codes/matmat.py! MRWorkshop
  • 41. Matrix-matrix product David Gleich · Purdue 41 A B AB = C Cij = X k Aik Bkj A is stored by row $ head samples/smat_10_5_A.txt ! 0 0 0.599 4 -1.53! 1! 2 2 0.260! 3! 4 0 0.267 1 0.839 B is stored by row $ head samples/smat_5_5.txt ! 0 0 0.125 3 1.024 4 0.121! 1 0 0.597! 2 2 1.247! Follow along! mapreduce-matrix-tutorial! /codes/matmat.py! MRWorkshop
  • 42. Matrix-matrix product ! (in pictures) David Gleich · Purdue 42 A B AB = C Cij = X k Aik Bkj A Map 1! Align on columns! B Reduce 1! Output Aik Bkj! keyed on (i,j) A B Reduce 2! Output sum(Aik Bkj)! C MRWorkshop
  • 43. Matrix-matrix product ! (in code) David Gleich · Purdue 43 A B AB = C Cij = X k Aik Bkj A Map 1! Align on columns! B def joinmap(self, key, line):! mtype = self.parsemat()! vals = line.split()! row = vals[0]! rowvals = ! [(vals[i],float(vals[i+1])) ! for i in xrange(1,len(vals),2)]! if mtype==1:! # matrix A, output by col! for val in rowvals:! yield (val[0], (row, val[1]))! else:! yield (row, (rowvals,))! MRWorkshop
  • 44. Matrix-matrix product ! (in code) David Gleich · Purdue 44 A B AB = C Cij = X k Aik Bkj A Map 1! Align on columns! B Reduce 1! Output Aik Bkj! keyed on (i,j) A B def joinred(self, key, line):! # load the data into memory ! brow = []! acol = []! for val in vals:! if len(val) == 1:! brow.extend(val[0])! else:! acol.append(val)! ! for (bcol,bval) in brow:! for (arow,aval) in acol:! yield ((arow,bcol),aval*bval)! MRWorkshop
  • 45. Matrix-matrix product ! (in pictures) David Gleich · Purdue 45 A B AB = C Cij = X k Aik Bkj A Map 1! Align on columns! B Reduce 1! Output Aik Bkj! keyed on (i,j) A B Reduce 2! Output sum(Aik Bkj)! C def sumred(self, key, vals):! yield (key, sum(vals))! MRWorkshop
  • 46. Why is MapReduce so popular? if (root) {! PetscInt cur_nz=0;! unsigned char* root_nz_buf;! unsigned int *root_nz_buf_i,*root_nz_buf_j;! double *root_nz_buf_v;! PetscMalloc((sizeof(unsigned int)*2+sizeof(double))*root_nz_bufsize,root_nz_buf);! PetscMalloc(sizeof(unsigned int)*root_nz_bufsize,root_nz_buf_i);! PetscMalloc(sizeof(unsigned int)*root_nz_bufsize,root_nz_buf_j);! PetscMalloc(sizeof(double)*root_nz_bufsize,root_nz_buf_v);! ! unsigned long long int nzs_to_read = total_nz;! ! while (send_rounds 0) {! // check if we are near the end of the file! // and just read that amount! size_t cur_nz_read = root_nz_bufsize;! if (cur_nz_read nzs_to_read) {! cur_nz_read = nzs_to_read;! }! PetscInfo2(PETSC_NULL, reading %i non-zeros of %llin, cur_nz_read, nzs_to_read);! 600 lines of gross code in order to load a sparse matrix into memory, streaming from one processor. MapReduce offers a better alternative David Gleich · Purdue 46 MRWorkshop
  • 47. Thoughts on a better system Default quadruple precision Matrix computations without indexing Easy setup of MPI data jobs David Gleich · Purdue 47                                        Initial data load of any MPI job Compute task MRWorkshop
  • 48. Double-precision floating point was designed for the era where “big” was 1000-10000 David Gleich · Purdue 48 MRWorkshop
  • 49. Error analysis of summation s = 0; for i=1 to n: s = s + x[i] A simple summation formula has ! error that is not always small if n is a billion David Gleich · Purdue 49 fl(x + y) = (x + y)(1 + ) fl( X i xi ) X i xi  nµ X i |xi | µ ⇡ 10 16 MRWorkshop
  • 50. If your application matters then watch out for this issue. Use quad-precision arithmetic or compensated summation instead. David Gleich · Purdue 50 MRWorkshop
  • 51. Compensated Summation “Kahan summation algorithm” on Wikipedia s = 0.; c = 0.; for i=1 to n: y = x[i] – c t = s + y c = (t – s) – y s = t David Gleich · Purdue 51 Mathematically, c is always zero. On a computer, c can be non-zero The parentheses matter! fl(csum(x)) X i xi  (µ + nµ2 ) X i |xi | µ ⇡ 10 16 MRWorkshop
  • 52. Summary MapReduce is a powerful but limited tool that has a role in the future of computational math. … but it should be used carefully! See Austin’s talk next! David Gleich · Purdue 52 MRWorkshop Code samples and short tutorials at github.com/dgleich/mrmatrix github.com/dgleich/mapreduce-matrix-tutorial
  • 53. David Gleich · Purdue 53 MRWorkshop