SlideShare uma empresa Scribd logo
1 de 35
Baixar para ler offline
Randomized Regression
in Parallel and Distributed Environments
Michael W. Mahoney
Stanford University
( For more info, see:
http:// cs.stanford.edu/people/mmahoney/
or Google on “Michael Mahoney”)

November 2013

Mahoney (Stanford)

Randomized Regression

November 2013

1 / 35
Motivation: larger-scale graph analytics
Leskovec, Lang, Dasgupta, and Mahoney, “Community Structure in Large Networks” Internet Mathematics, (2009)
How people think of networks:

What real networks look like:

True on 103 node graphs: meaning even small graphs have “bad” properties
True on 106 node graphs: where many/most reall applications are interested
True on 109 node graphs: presumably, bue we don’t have tools to test it
Goal:
Go beyond “simple statistics,” e.g., triangles-closing and degree distribution
Develop more sophisticated “interactive analytics” methods for large-scale data

Mahoney (Stanford)

Randomized Regression

November 2013

2 / 35
Goal: very large-scale “vector space analytics”
Small-scale and medium-scale:
Model data by graphs and matrices
Compute eigenvectors, correlations, etc. in RAM
Very large-scale:
Model data with flat tables and the relational model
Compute with join/select and other “counting” in, e.g., Hadoop
Can we “bridge the gap” and do “vector space computations” at very
large scale?
Not obviously yes: exactly computing eigenvectors, correlations, etc. is
subtle and uses lots of comminication.
Not obviously no: lesson from random sampling algorithms is you can get
-approximation of optimal with very few samples.

Mahoney (Stanford)

Randomized Regression

November 2013

3 / 35
Why randomized matrix algorithms?
Traditional matrix algorithms (direct & iterative methods, interior point,
simplex, etc.) are designed to work in RAM and their performance is
measured in floating-point operations per second (FLOPS).
Traditional algorithms are NOT well-suited for:
problems that are very large
distributed or parallel computation
when communication is a bottleneck
when the data must be accessed via “passes”

Randomized matrix algorithms are:
faster: better theory
simpler: easier to implement
implicitly regularize: noise in the algorithm avoids overfitting
inherently parallel: exploiting modern computer architectures
more scalable: modern massive data sets
Mahoney (Stanford)

Randomized Regression

November 2013

4 / 35
Over-determined/over-constrained regression
An p regression problem is specified by a design matrix A ∈ Rm×n , a
response vector b ∈ Rm , and a norm · p :
minimizex∈Rn

Ax − b

p.

Assume m
n, i.e., many more “constraints” than “variables.” Given an
> 0, find a (1 + )-approximate solution x in relative scale, i.e.,
ˆ
Aˆ − b
x

p

≤ (1 + ) Ax ∗ − b

p,

where x ∗ is a/the optimal solution.
p = 2: Least Squares Approximation: Very widely-used, but highly
non-robust to outliers.
p = 1: Least Absolute Deviations: Improved robustness, but at the
cost of increased complexity.
Mahoney (Stanford)

Randomized Regression

November 2013

5 / 35
Strongly rectangular data

Some examples:
SNP
TinyImages
PDE
sensor network
NLP

m
number of SNPs (106 )
number of pixels in each image (103 )
number of degrees of freedom
size of sensing data
number of words and n-grams

n
number of subjects (103 )
number of images (108 )
number of time steps
number of sensors
number of principle components

More generally:
Over-constrained 1 / 2 regression is good model for implementing
other matrix algorithms (low-rank approximation, matrix completion,
generalized linear models, etc.) in large-scale settings.

Mahoney (Stanford)

Randomized Regression

November 2013

6 / 35
Large-scale environments and how they scale
Shared memory
cores: [10, 103 ]∗
memory: [100GB, 100TB]

Message passing
cores: [200, 105 ]†
memory: [1TB, 1000TB]
CUDA cores: [5 × 104 , 3 × 106 ]‡
GPU memory: [500GB, 20TB]

MapReduce
cores: [40, 105 ]§
memory: [240GB, 100TB]
storage: [100TB, 100PB]¶

Distributed computing
cores: [−, 3 × 105 ] .
∗
†
‡
§
¶

http://www.sgi.com/pdfs/4358.pdf
http://www.top500.org/list/2011/11/100
http://i.top500.org/site/50310
http://www.cloudera.com/blog/2010/04/pushing-the-limits-of-distributed-processing/
http://hortonworks.com/blog/an-introduction-to-hdfs-federation/
http://fah-web.stanford.edu/cgi-bin/main.py?qtype=osstats
Mahoney (Stanford)

Randomized Regression

November 2013

7 / 35
Outline

1

Randomized regression in RAM

2

Solving

2

regression using MPI

3

Solving

1

and quantile regression on MapReduce
Two important notions: leverage and condition
(Mahoney, “Randomized Algorithms for Matrices and Data,” FnTML, 2011.)

Statistical leverage. (Think: eigenvectors. Important for low-precision.)
The statistical leverage scores of A (assume m
n) are the diagonal
elements of the projection matrix onto the column span of A.
They equal the 2 -norm-squared of any orthogonal basis spanning A.
They measure:
how well-correlated the singular vectors are with the canonical basis
which constraints have largest “influence” on the LS fit
a notion of “coherence” or “outlierness”

Computing them exactly is as hard as solving the LS problem.

Condition number. (Think: eigenvalues. Important for high-precision.)
+
The 2 -norm condition number of A is κ(A) = σmax (A)/σmin (A).
κ(A) bounds the number of iterations; for ill-conditioned problems
(e.g., κ(A) ≈ 106
1), the convergence speed is very slow.
Computing κ(A) is generally as hard as solving the LS problem.

These are for the
Mahoney (Stanford)

2 -norm.

Generalizations exist for the
Randomized Regression

1 -norm.
November 2013

9 / 35
Meta-algorithm for

2 -norm

regression (1 of 3)

(Drineas, Mahoney, etc., 2006, 2008, etc., starting with SODA 2006; Mahoney FnTML, 2011.)

1:

Using the 2 statistical leverage scores of A, construct an importance
sampling distribution {pi }m .
i=1

2:

Randomly sample a small number of constraints according to {pi }m
i=1
to construct a subproblem.

3:

Solve the

2 -regression

problem on the subproblem.

A na¨ version of this meta-algorithm gives a 1 + relative-error
ıve
approximation in roughly O(mn2 / ) time (DMM 2006, 2008). (Ugh.)

Mahoney (Stanford)

Randomized Regression

November 2013

10 / 35
Meta-algorithm for

2 -norm

regression (2 of 3)

(Drineas, Mahoney, etc., 2006, 2008, etc., starting with SODA 2006; Mahoney FnTML, 2011.)

Randomly sample high-leverage
constraints
Solve the subproblem

(In many moderately large-scale
applications, one uses “ 2 objectives,”
not since they are “right,” but since
other things are even more expensive.)

Mahoney (Stanford)

Randomized Regression

November 2013

11 / 35
Meta-algorithm for

2 -norm

regression (2 of 3)

(Drineas, Mahoney, etc., 2006, 2008, etc., starting with SODA 2006; Mahoney FnTML, 2011.)

Randomly sample high-leverage
constraints
Solve the subproblem

(In many moderately large-scale
applications, one uses “ 2 objectives,”
not since they are “right,” but since
other things are even more expensive.)

Mahoney (Stanford)

Randomized Regression

November 2013

12 / 35
Meta-algorithm for

2 -norm

regression (2 of 3)

(Drineas, Mahoney, etc., 2006, 2008, etc., starting with SODA 2006; Mahoney FnTML, 2011.)

Randomly sample high-leverage
constraints
Solve the subproblem

(In many moderately large-scale
applications, one uses “ 2 objectives,”
not since they are “right,” but since
other things are even more expensive.)

Mahoney (Stanford)

Randomized Regression

November 2013

13 / 35
Meta-algorithm for

2 -norm

regression (3 of 3)

(Drineas, Mahoney, etc., 2006, 2008, etc., starting with SODA 2006; Mahoney FnTML, 2011.‡‡ )

We can make this meta-algorithm “fast” in RAM:∗∗
This meta-algorithm runs in O(mn log n/ ) time in RAM if:
we perform a Hadamard-based random projection and sample uniformly
in a randomly rotated basis, or
we quickly computing approximations to the statistical leverage scores
and using those as an importance sampling distribution.

We can make this meta-algorithm “high precision” in RAM:††
This meta-algorithm runs in O(mn log n log(1/ )) time in RAM if:
we use the random projection/sampling basis to construct a
preconditioner and couple with a traditional iterative method.
∗∗
††
‡‡

(Sarl´s 2006; Drineas, Mahoney, Muthu, Sarl´s 2010; Drineas, Magdon-Ismail, Mahoney, Woodruff 2011.)
o
o
(Rokhlin & Tygert 2008; Avron, Maymounkov, & Toledo 2010; Meng, Saunders, & Mahoney 2011.)
(Mahoney, “Randomized Algorithms for Matrices and Data,” FnTML, 2011.)
Mahoney (Stanford)

Randomized Regression

November 2013

14 / 35
Randomized regression in RAM: Implementations
Avron, Maymounkov, and Toledo, SISC, 32, 1217–1236, 2010.

Conclusions:
Randomized algorithms “beats Lapack’s direct dense least-squares
solver by a large margin on essentially any dense tall matrix.”
These results “suggest that random projection algorithms should be
incorporated into future versions of Lapack.”
Mahoney (Stanford)

Randomized Regression

November 2013

15 / 35
Randomized regression in RAM: Human Genetics
Paschou et al., PLoS Gen ’07; Paschou et al., J Med Gen ’10; Drineas et al., PLoS ONE ’10; Javed et al., Ann Hum Gen ’11.

Computing large rectangular regressions/SVDs/CUR decompositions:
In commodity hardware (e.g., a 4GB RAM, dual-core laptop), using MatLab 7.0 (R14), the computation of the SVD of
the dense 2, 240 × 447, 143 matrix A takes about 20 minutes.
Computing this SVD is not a one-liner—we can not load the whole matrix in RAM (runs out-of-memory in MatLab).
Instead, compute the SVD of AAT.
In a similar experiment, compute 1,200 SVDs on matrices of dimensions (approx.) 1, 200 × 450, 000 (roughly, a full
leave-one-out cross-validation experiment).
Mahoney (Stanford)

Randomized Regression

November 2013

16 / 35
A retrospective

Randomized matrix algorithms:
BIG success story in high precision scientific computing applications
and large-scale statistical data analysis!
Can they really be implemented in parallel and distributed
environments for LARGE-scale statistical data analysis?

Mahoney (Stanford)

Randomized Regression

November 2013

17 / 35
Outline

1

Randomized regression in RAM

2

Solving

2

regression using MPI

3

Solving

1

and quantile regression on MapReduce
Algorithm LSRN

(for strongly over-determined systems)

(Meng, Saunders, and Mahoney 2011)

Perform a Gaussian random projection
2: Construct a preconditioner from the subsample
3: Iterate with a “traditional” iterative algorithm

1:

Why a Gaussian random projection? Since it
provides the strongest results for conditioning,
uses level 3 BLAS on dense matrices and can be generated super fast,
speeds up automatically on sparse matrices and fast operators,
still works (with an extra “allreduce” operation) when A is partitioned
along its bigger dimension.
Although it is “slow” (compared with “fast” Hadamard-based projections
i.t.o. FLOPS), it allows for better communication properties.
Mahoney (Stanford)

Randomized Regression

November 2013

19 / 35
Implementation of LSRN
(Meng, Saunders, and Mahoney 2011)

Shared memory (C++ with MATLAB interface)
Multi-threaded ziggurat random number generator (Marsaglia and Tsang
2000), generating 109 numbers in less than 2 seconds on 12 CPU cores.
A na¨ implementation of multi-threaded dense-sparse matrix
ıve
multiplications.

Message passing (Python)
Single-threaded BLAS for matrix-matrix and matrix-vector products.
Multi-threaded BLAS/LAPACK for SVD.
Using the Chebyshev semi-iterative method (Golub and Varga 1961)
instead of LSQR.

Mahoney (Stanford)

Randomized Regression

November 2013

20 / 35
Solving real-world problems

matrix
landmark
rail4284
tnimg 1
tnimg 2
tnimg 3
tnimg 4
tnimg 5

m
71952
4284
951
1000
1018
1019
1023

n
2704
1.1e6
1e6
2e6
3e6
4e6
5e6

nnz
1.15e6
1.1e7
2.1e7
4.2e7
6.3e7
8.4e7
1.1e8

rank
2671
full
925
981
1016
1018
full

cond
1.0e8
400.0
-

DGELSD
29.54
> 3600
630.6
1291
2084
2945
> 3600

Ab
0.6498∗
1.203∗
1067∗
> 3600∗
> 3600∗
> 3600∗
> 3600∗

Blendenpik
OOM
OOM

LSRN
17.55
136.0
36.02
72.05
111.1
147.1
188.5

Table: Real-world problems and corresponding running times. DGELSD doesn’t
take advantage of sparsity. Though MATLAB’s backslash may not give the
min-length solutions to rank-deficient or under-determined problems, we still
report its running times. Blendenpik either doesn’t apply to rank-deficient
problems or runs out of memory (OOM). LSRN’s running time is mainly
determined by the problem size and the sparsity.

Mahoney (Stanford)

Randomized Regression

November 2013

21 / 35
Iterating with LSQR
(Paige and Saunders 1982)

Code snippet (Python):
u
= A . matvec ( v ) − a l p h a ∗u
b e t a = s q r t (comm . a l l r e d u c e ( np . d o t ( u , u ) ) )
...
v
= comm . a l l r e d u c e (A . r m a t v e c ( u ) ) − b e t a ∗ v
Cost per iteration:
two matrix-vector multiplications
two cluster-wide synchronizations

Mahoney (Stanford)

Randomized Regression

November 2013

22 / 35
Iterating with Chebyshev semi-iterative (CS) method
(Golub and Varga 1961)

The strong concentration results on σ max (AN) and σ min (AN) enable use
of the CS method, which requires an accurate bound on the extreme
singular values to work efficiently.
Code snippet (Python):
v = comm . a l l r e d u c e (A . r m a t v e c ( r ) ) − b e t a ∗ v
x += a l p h a ∗ v
r −= a l p h a ∗A . matvec ( v )
Cost per iteration:
two matrix-vector multiplications
one cluster-wide synchronization

Mahoney (Stanford)

Randomized Regression

November 2013

23 / 35
LSQR vs. CS on an Amazon EC2 cluster
(Meng, Saunders, and Mahoney 2011)

solver
LSRN w/ CS
LSRN w/ LSQR
LSRN w/ CS
LSRN w/ LSQR
LSRN w/ CS
LSRN w/ LSQR
LSRN w/ CS
LSRN w/ LSQR

Nnodes

Nprocesses

m

n

nnz

2

4

1024

4e6

8.4e7

5

10

1024

1e7

2.1e8

10

20

1024

2e7

4.2e8

20

40

1024

4e7

8.4e8

Niter
106
84
106
84
106
84
106
84

Titer
34.03
41.14
50.37
68.72
73.73
102.3
102.5
137.2

Ttotal
170.4
178.6
193.3
211.6
220.9
249.0
255.6
290.2

Table: Test problems on an Amazon EC2 cluster and corresponding running times
in seconds. Though the CS method takes more iterations, it actually runs faster
than LSQR by making only one cluster-wide synchronization per iteration.

Mahoney (Stanford)

Randomized Regression

November 2013

24 / 35
Outline

1

Randomized regression in RAM

2

Solving

2

regression using MPI

3

Solving

1

and quantile regression on MapReduce
Comparison of three types of regression problems
2

estimation

regression
mean

loss function

z2

formulation
is a norm?

Ax − b
yes

1

regression
median
|z|

2
2

Ax − b
yes

quantile regression
quantile τ
ρτ (x) =

τ z,
(τ − 1)z,

z ≥ 0;
z < 0.

ρτ (Ax − b)
no

1

These measure different functions of the response variable given certain
values of the predictor variables.
L2 regression

L1 regression

1

Quantile regression

1

0.8
0.6

0.5

0.5

0.4
0.2

0
−1
Mahoney (Stanford)

0

1

0
−1

0

Randomized Regression

1

0
−1

0

1
November 2013

26 / 35
“Everything generalizes” from

2

regression to

(But “everything generalizes messily” since

1

1

is “worse” that

regression
2 .)

A matrix U ∈ Rm×n is (α, β, p = 1)-conditioned if |U|1 ≤ α and
x ∞ ≤ β Ux 1 , ∀x; and 1 -well-conditioned basis if α, β = poly(n).
Define the 1 leverage scores of an m × n matrix A, with m > n, as
the 1 -norms-squared of the rows of an 1 -well-conditioned basis of A.
Define the

1 -norm

condition number of A, denoted by κ1 (A), as:

κ1 (A) =
min
This implies: σ1 (A) x

Mahoney (Stanford)

max
max x
σ1 (A)
=
min (A)
min x
σ1
2

≤ Ax

1

2 =1
2 =1

Ax
Ax

max
≤ σ1 (A) x

Randomized Regression

1

.

1

2,

∀x ∈ Rn .
November 2013

27 / 35
Making

1

and quantile work to low and high precision

Finding a good basis (to get a low-precision solution):
name
SCT[SW11]
FCT [CDMMMW13]
Ellipsoid rounding [Cla05]
Fast ER [CDMMMW13]

running time
O(mn2 log n)
O(mn log n)
O(mn5 log m))
O(mn3 log m))

κ
O(n5/2 log3/2 m)
7/2
O(n
log5/2 m)
n3/2 (n + 1)1/2
2n2

SPC1 [MM13]
SPC2 [MM13]

O(nnz(A) · log m)
O(nnz(A) · log m) + ER small
O(nnz(A) · log m) + QR small

SPC3 [YMM13]

type
QR
QR
ER
ER

passes (for soln)
2
2
n4
n2

O(n 2 log 2 n)
6n2

13

11

QR
QR+ER

2
3

19
O(n 4

11
log 4

QR+QR

3

n)

Iteratively solving (to get more digits of precision):
subgradient (Clarkson 2005)
gradient (Nesterov 2009)
ellipsoid (Nemirovski and Yudin 1972)
inscribed ellipsoids
(Tarasov, Khachiyan, and Erlikh 1988)

Mahoney (Stanford)

passes
O(n4 / 2 )
O(m1/2 / )
O(n2 log(κ1 / ))
O(n log(κ1 / ))

Randomized Regression

extra work per pass
—
—
—
O(n7/2 log n)

November 2013

28 / 35
Prior work and evaluations
Evaluate on real and simulated data:
Simulated data, size ca. 109 × 102 , designed to have “bad” nonuniformities.
Real US Census data, size ca. 107 × 10, or “stacked” to size ca. 1010 × 10.

0.13

0
−0.05
Education

Unmarried

0.12
0.11
0.1

Solution of LS regression
Solution of LAD regression
Solution of Quantile regression
Approximate solution of Quantile
90% confidence intervals

−0.1
−0.15

0.09
0.08
0

0.5
Quantile

(a) Unmarried

1

−0.2
0

0.5

1

Quantile

(b) Education

(c) Legend

State of the art (due to Portnoy-Koenker, 1997]):
Standard solver for quantile regression is interior-point method ipm,
applicable for 106 × 50-sized problems.
Best previous sampling algorithm for quantile regression, prqfn, uses an
interior-point method on a smaller randomly-constructed subproblem.
Mahoney (Stanford)

Randomized Regression

November 2013

29 / 35
A MapReduce implementation
Inputs: A ∈ Rm×n and κ1 such that
x

2

≤ Ax

1

≤ κ1 x

2,

∀x,

c ∈ Rn , sample size s, and number of subsampled solutions nx .
Mapper:
1
2

For each row ai of A, let pi = min{s ai 1 /(κ1 n1/2 ), 1}.
For k = 1, . . . , nx , emit (k, ai /pi ) with probability pi .

Reducer:
1
2
3

Collect row vectors associated with key k and assemble Ak .
Compute xk = arg minc T x=1 Ak x 1 using interior-point methods.
ˆ
Return xk .
ˆ

Note that multiple subsampled solutions can be computed in a single pass.

Mahoney (Stanford)

Randomized Regression

November 2013

30 / 35
Evaluation on large-scale

regression problem

1

0

10

cauchy
gaussian
nocd
unif

−1

|xj − x*|
j

10

−2

10

−3

10

2

4

6

8
index

10

12

14

First (solid) and the third (dashed) quartiles of entry-wise absolute errors (on synthetic
data that has “bad” nonuniformities).

CT (Cauchy)
GT (Gaussian)
NOCD
UNIF

x − x ∗ 1/ x ∗ 1
[0.008, 0.0115]
[0.0126, 0.0168]
[0.0823, 22.1]
[0.0572, 0.0951]

x − x ∗ 2/ x ∗ 2
[0.00895, 0.0146]
[0.0152, 0.0232]
[0.126, 70.8]
[0.089, 0.166]

x − x ∗ ∞/ x ∗ ∞
[0.0113, 0.0211]
[0.0184, 0.0366]
[0.193, 134]
[0.129, 0.254]

First and the third quartiles of relative errors in 1-, 2-, and ∞-norms on a data set of
size 1010 × 15. CT (and FCT) clearly performs the best. GT is worse but follows closely.
NOCD and UNIF are much worse. (Similar results for size 1010 × 100 if SPC2 is used.)
Mahoney (Stanford)

Randomized Regression

November 2013

31 / 35
The Method of Inscribed Ellipsoids (MIE)
MIE works similarly to the bisection method, but in a higher dimension.
Why do we choose MIE?
Least number of iterations
Initialization using all the subsampled solutions
Multiple queries per iteration
At each iteration, we need to compute (1) a function value and (2) a
gradient/subgradient.
For each subsampled solution, we have a hemisphere that contains
the optimal solution.
We use all these solution hemispheres to construct initial search
region.

Mahoney (Stanford)

Randomized Regression

November 2013

32 / 35
Computing multiple f and g in a single pass
On MapReduce, the IO cost may dominate the computational cost, which
requires algorithms that could do more computation in a single pass.
Single query:
f (x) = Ax

1,

g (x) = AT sign(Ax).

Multiple queries:
F (X ) = sum(|AX |, 0),

G (X ) = AT sign(AX ).

An example on a 10-node Hadoop cluster:
A : 108 × 50, 118.7GB.
A single query: 282 seconds.
100 queries in a single pass: 328 seconds.

Mahoney (Stanford)

Randomized Regression

November 2013

33 / 35
MIE with sampling initialization and multiple queries

4

4

10

10

3

10

2

10

2

10
0

10

1

10

standard IPCPM

−2
0

proposed IPCPM

10
mie

−4

(f−f*)/f*

relative error

10

mie w/ multi q

10

mie w/ sample init
mie w/ sample init and multi q

−1

10

−2

10

−6

10

−3

10
−8

10

−4

10
−10

10

−5

10

−12

10

−6

0

10

20

30

40
50
60
number of iterations

70

80

90

100

10

6

(d) size: 10 × 20

5

10

20

25

30

(e) size: 5.24e9 × 15

Comparing different MIE methods on large/LARGE

Mahoney (Stanford)

15
number of iterations

Randomized Regression

1

regression problem.

November 2013

34 / 35
Conclusion

Randomized regression in parallel & distributed environments:
different design principles for high-precision versus low-precision
Least Squares Approximation
Least Absolute Deviations
Quantile Regression

Algorithms require more computation than traditional matrix
algorithms, but they have better communication profiles.
On MPI: Chebyshev semi-iterative method vs. LSQR.
On MapReduce: Method of inscribed ellipsoids with multiple queries.
Look beyond FLOPS in parallel and distributed environments.

Mahoney (Stanford)

Randomized Regression

November 2013

35 / 35

Mais conteúdo relacionado

Mais procurados

Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf
 
Neural networks and google tensor flow
Neural networks and google tensor flowNeural networks and google tensor flow
Neural networks and google tensor flowShannon McCormick
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams   Esteban DonatoEvaluating Classification Algorithms Applied To Data Streams   Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams Esteban DonatoEsteban Donato
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlowBarbara Fusinska
 
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15MLconf
 
Using Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesUsing Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesHJ van Veen
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Sri Ambati
 
How to win data science competitions with Deep Learning
How to win data science competitions with Deep LearningHow to win data science competitions with Deep Learning
How to win data science competitions with Deep LearningSri Ambati
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16MLconf
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorizationrecsysfr
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlowDarshan Patel
 
H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614Sri Ambati
 
Deep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyDeep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyJason Tsai
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksDatabricks
 
Building Random Forest at Scale
Building Random Forest at ScaleBuilding Random Forest at Scale
Building Random Forest at ScaleSri Ambati
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Oswald Campesato
 

Mais procurados (20)

Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 
Neural networks and google tensor flow
Neural networks and google tensor flowNeural networks and google tensor flow
Neural networks and google tensor flow
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams   Esteban DonatoEvaluating Classification Algorithms Applied To Data Streams   Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
 
Tensor flow
Tensor flowTensor flow
Tensor flow
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
 
Using Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesUsing Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar Dresses
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013
 
How to win data science competitions with Deep Learning
How to win data science competitions with Deep LearningHow to win data science competitions with Deep Learning
How to win data science competitions with Deep Learning
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlow
 
H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614
 
Deep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyDeep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical Methodology
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
 
Building Random Forest at Scale
Building Random Forest at ScaleBuilding Random Forest at Scale
Building Random Forest at Scale
 
Neural network
Neural networkNeural network
Neural network
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)
 

Semelhante a Mahoney mlconf-nov13

Towards reducing the
Towards reducing theTowards reducing the
Towards reducing theIJDKP
 
Poor man's missing value imputation
Poor man's missing value imputationPoor man's missing value imputation
Poor man's missing value imputationLeonardo Auslender
 
Sparse inverse covariance estimation using skggm
Sparse inverse covariance estimation using skggmSparse inverse covariance estimation using skggm
Sparse inverse covariance estimation using skggmManjari Narayan
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009Matthew Magistrado
 
Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...
Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...
Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...EmadfHABIB2
 
A Dynamic Systems Approach to Production Management in the Automotive Industry
A Dynamic Systems Approach to Production Management in the Automotive IndustryA Dynamic Systems Approach to Production Management in the Automotive Industry
A Dynamic Systems Approach to Production Management in the Automotive IndustryFrancisco Restivo
 
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining AppliedDMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining AppliedJohannes Hoppe
 
Pak eko 4412ijdms01
Pak eko 4412ijdms01Pak eko 4412ijdms01
Pak eko 4412ijdms01hyuviridvic
 
ProbabilisticModeling20080411
ProbabilisticModeling20080411ProbabilisticModeling20080411
ProbabilisticModeling20080411Clay Stanek
 
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...theijes
 
Hybrid Meta-Heuristic Algorithms For Solving Network Design Problem
Hybrid Meta-Heuristic Algorithms For Solving Network Design ProblemHybrid Meta-Heuristic Algorithms For Solving Network Design Problem
Hybrid Meta-Heuristic Algorithms For Solving Network Design ProblemAlana Cartwright
 
Massive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and ApplicationsMassive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and ApplicationsVijay Raghavan
 
ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
ABSTAT: Ontology-driven Linked Data Summaries with Pattern MinimalizationABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
ABSTAT: Ontology-driven Linked Data Summaries with Pattern MinimalizationBlerina Spahiu
 
Bat algorithm and applications
Bat algorithm and applicationsBat algorithm and applications
Bat algorithm and applicationsMd.Al-imran Roton
 
PyData Miami 2019, Quantum Generalized Linear Models
PyData Miami 2019, Quantum Generalized Linear ModelsPyData Miami 2019, Quantum Generalized Linear Models
PyData Miami 2019, Quantum Generalized Linear ModelsColleen Farrelly
 
Measuring and Predicting Departures from Routine in Human Mobility
Measuring and Predicting Departures from Routine in Human MobilityMeasuring and Predicting Departures from Routine in Human Mobility
Measuring and Predicting Departures from Routine in Human MobilityDirk Gorissen
 

Semelhante a Mahoney mlconf-nov13 (20)

Towards reducing the
Towards reducing theTowards reducing the
Towards reducing the
 
Poor man's missing value imputation
Poor man's missing value imputationPoor man's missing value imputation
Poor man's missing value imputation
 
SENIOR COMP FINAL
SENIOR COMP FINALSENIOR COMP FINAL
SENIOR COMP FINAL
 
Sparse inverse covariance estimation using skggm
Sparse inverse covariance estimation using skggmSparse inverse covariance estimation using skggm
Sparse inverse covariance estimation using skggm
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009
 
slides
slidesslides
slides
 
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
 
Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...
Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...
Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...
 
A Dynamic Systems Approach to Production Management in the Automotive Industry
A Dynamic Systems Approach to Production Management in the Automotive IndustryA Dynamic Systems Approach to Production Management in the Automotive Industry
A Dynamic Systems Approach to Production Management in the Automotive Industry
 
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining AppliedDMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
 
Pak eko 4412ijdms01
Pak eko 4412ijdms01Pak eko 4412ijdms01
Pak eko 4412ijdms01
 
ProbabilisticModeling20080411
ProbabilisticModeling20080411ProbabilisticModeling20080411
ProbabilisticModeling20080411
 
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
 
Hybrid Meta-Heuristic Algorithms For Solving Network Design Problem
Hybrid Meta-Heuristic Algorithms For Solving Network Design ProblemHybrid Meta-Heuristic Algorithms For Solving Network Design Problem
Hybrid Meta-Heuristic Algorithms For Solving Network Design Problem
 
Massive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and ApplicationsMassive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and Applications
 
18 786
18 78618 786
18 786
 
ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
ABSTAT: Ontology-driven Linked Data Summaries with Pattern MinimalizationABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization
 
Bat algorithm and applications
Bat algorithm and applicationsBat algorithm and applications
Bat algorithm and applications
 
PyData Miami 2019, Quantum Generalized Linear Models
PyData Miami 2019, Quantum Generalized Linear ModelsPyData Miami 2019, Quantum Generalized Linear Models
PyData Miami 2019, Quantum Generalized Linear Models
 
Measuring and Predicting Departures from Routine in Human Mobility
Measuring and Predicting Departures from Routine in Human MobilityMeasuring and Predicting Departures from Routine in Human Mobility
Measuring and Predicting Departures from Routine in Human Mobility
 

Mais de MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

Mais de MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Último

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Último (20)

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Mahoney mlconf-nov13

  • 1. Randomized Regression in Parallel and Distributed Environments Michael W. Mahoney Stanford University ( For more info, see: http:// cs.stanford.edu/people/mmahoney/ or Google on “Michael Mahoney”) November 2013 Mahoney (Stanford) Randomized Regression November 2013 1 / 35
  • 2. Motivation: larger-scale graph analytics Leskovec, Lang, Dasgupta, and Mahoney, “Community Structure in Large Networks” Internet Mathematics, (2009) How people think of networks: What real networks look like: True on 103 node graphs: meaning even small graphs have “bad” properties True on 106 node graphs: where many/most reall applications are interested True on 109 node graphs: presumably, bue we don’t have tools to test it Goal: Go beyond “simple statistics,” e.g., triangles-closing and degree distribution Develop more sophisticated “interactive analytics” methods for large-scale data Mahoney (Stanford) Randomized Regression November 2013 2 / 35
  • 3. Goal: very large-scale “vector space analytics” Small-scale and medium-scale: Model data by graphs and matrices Compute eigenvectors, correlations, etc. in RAM Very large-scale: Model data with flat tables and the relational model Compute with join/select and other “counting” in, e.g., Hadoop Can we “bridge the gap” and do “vector space computations” at very large scale? Not obviously yes: exactly computing eigenvectors, correlations, etc. is subtle and uses lots of comminication. Not obviously no: lesson from random sampling algorithms is you can get -approximation of optimal with very few samples. Mahoney (Stanford) Randomized Regression November 2013 3 / 35
  • 4. Why randomized matrix algorithms? Traditional matrix algorithms (direct & iterative methods, interior point, simplex, etc.) are designed to work in RAM and their performance is measured in floating-point operations per second (FLOPS). Traditional algorithms are NOT well-suited for: problems that are very large distributed or parallel computation when communication is a bottleneck when the data must be accessed via “passes” Randomized matrix algorithms are: faster: better theory simpler: easier to implement implicitly regularize: noise in the algorithm avoids overfitting inherently parallel: exploiting modern computer architectures more scalable: modern massive data sets Mahoney (Stanford) Randomized Regression November 2013 4 / 35
  • 5. Over-determined/over-constrained regression An p regression problem is specified by a design matrix A ∈ Rm×n , a response vector b ∈ Rm , and a norm · p : minimizex∈Rn Ax − b p. Assume m n, i.e., many more “constraints” than “variables.” Given an > 0, find a (1 + )-approximate solution x in relative scale, i.e., ˆ Aˆ − b x p ≤ (1 + ) Ax ∗ − b p, where x ∗ is a/the optimal solution. p = 2: Least Squares Approximation: Very widely-used, but highly non-robust to outliers. p = 1: Least Absolute Deviations: Improved robustness, but at the cost of increased complexity. Mahoney (Stanford) Randomized Regression November 2013 5 / 35
  • 6. Strongly rectangular data Some examples: SNP TinyImages PDE sensor network NLP m number of SNPs (106 ) number of pixels in each image (103 ) number of degrees of freedom size of sensing data number of words and n-grams n number of subjects (103 ) number of images (108 ) number of time steps number of sensors number of principle components More generally: Over-constrained 1 / 2 regression is good model for implementing other matrix algorithms (low-rank approximation, matrix completion, generalized linear models, etc.) in large-scale settings. Mahoney (Stanford) Randomized Regression November 2013 6 / 35
  • 7. Large-scale environments and how they scale Shared memory cores: [10, 103 ]∗ memory: [100GB, 100TB] Message passing cores: [200, 105 ]† memory: [1TB, 1000TB] CUDA cores: [5 × 104 , 3 × 106 ]‡ GPU memory: [500GB, 20TB] MapReduce cores: [40, 105 ]§ memory: [240GB, 100TB] storage: [100TB, 100PB]¶ Distributed computing cores: [−, 3 × 105 ] . ∗ † ‡ § ¶ http://www.sgi.com/pdfs/4358.pdf http://www.top500.org/list/2011/11/100 http://i.top500.org/site/50310 http://www.cloudera.com/blog/2010/04/pushing-the-limits-of-distributed-processing/ http://hortonworks.com/blog/an-introduction-to-hdfs-federation/ http://fah-web.stanford.edu/cgi-bin/main.py?qtype=osstats Mahoney (Stanford) Randomized Regression November 2013 7 / 35
  • 8. Outline 1 Randomized regression in RAM 2 Solving 2 regression using MPI 3 Solving 1 and quantile regression on MapReduce
  • 9. Two important notions: leverage and condition (Mahoney, “Randomized Algorithms for Matrices and Data,” FnTML, 2011.) Statistical leverage. (Think: eigenvectors. Important for low-precision.) The statistical leverage scores of A (assume m n) are the diagonal elements of the projection matrix onto the column span of A. They equal the 2 -norm-squared of any orthogonal basis spanning A. They measure: how well-correlated the singular vectors are with the canonical basis which constraints have largest “influence” on the LS fit a notion of “coherence” or “outlierness” Computing them exactly is as hard as solving the LS problem. Condition number. (Think: eigenvalues. Important for high-precision.) + The 2 -norm condition number of A is κ(A) = σmax (A)/σmin (A). κ(A) bounds the number of iterations; for ill-conditioned problems (e.g., κ(A) ≈ 106 1), the convergence speed is very slow. Computing κ(A) is generally as hard as solving the LS problem. These are for the Mahoney (Stanford) 2 -norm. Generalizations exist for the Randomized Regression 1 -norm. November 2013 9 / 35
  • 10. Meta-algorithm for 2 -norm regression (1 of 3) (Drineas, Mahoney, etc., 2006, 2008, etc., starting with SODA 2006; Mahoney FnTML, 2011.) 1: Using the 2 statistical leverage scores of A, construct an importance sampling distribution {pi }m . i=1 2: Randomly sample a small number of constraints according to {pi }m i=1 to construct a subproblem. 3: Solve the 2 -regression problem on the subproblem. A na¨ version of this meta-algorithm gives a 1 + relative-error ıve approximation in roughly O(mn2 / ) time (DMM 2006, 2008). (Ugh.) Mahoney (Stanford) Randomized Regression November 2013 10 / 35
  • 11. Meta-algorithm for 2 -norm regression (2 of 3) (Drineas, Mahoney, etc., 2006, 2008, etc., starting with SODA 2006; Mahoney FnTML, 2011.) Randomly sample high-leverage constraints Solve the subproblem (In many moderately large-scale applications, one uses “ 2 objectives,” not since they are “right,” but since other things are even more expensive.) Mahoney (Stanford) Randomized Regression November 2013 11 / 35
  • 12. Meta-algorithm for 2 -norm regression (2 of 3) (Drineas, Mahoney, etc., 2006, 2008, etc., starting with SODA 2006; Mahoney FnTML, 2011.) Randomly sample high-leverage constraints Solve the subproblem (In many moderately large-scale applications, one uses “ 2 objectives,” not since they are “right,” but since other things are even more expensive.) Mahoney (Stanford) Randomized Regression November 2013 12 / 35
  • 13. Meta-algorithm for 2 -norm regression (2 of 3) (Drineas, Mahoney, etc., 2006, 2008, etc., starting with SODA 2006; Mahoney FnTML, 2011.) Randomly sample high-leverage constraints Solve the subproblem (In many moderately large-scale applications, one uses “ 2 objectives,” not since they are “right,” but since other things are even more expensive.) Mahoney (Stanford) Randomized Regression November 2013 13 / 35
  • 14. Meta-algorithm for 2 -norm regression (3 of 3) (Drineas, Mahoney, etc., 2006, 2008, etc., starting with SODA 2006; Mahoney FnTML, 2011.‡‡ ) We can make this meta-algorithm “fast” in RAM:∗∗ This meta-algorithm runs in O(mn log n/ ) time in RAM if: we perform a Hadamard-based random projection and sample uniformly in a randomly rotated basis, or we quickly computing approximations to the statistical leverage scores and using those as an importance sampling distribution. We can make this meta-algorithm “high precision” in RAM:†† This meta-algorithm runs in O(mn log n log(1/ )) time in RAM if: we use the random projection/sampling basis to construct a preconditioner and couple with a traditional iterative method. ∗∗ †† ‡‡ (Sarl´s 2006; Drineas, Mahoney, Muthu, Sarl´s 2010; Drineas, Magdon-Ismail, Mahoney, Woodruff 2011.) o o (Rokhlin & Tygert 2008; Avron, Maymounkov, & Toledo 2010; Meng, Saunders, & Mahoney 2011.) (Mahoney, “Randomized Algorithms for Matrices and Data,” FnTML, 2011.) Mahoney (Stanford) Randomized Regression November 2013 14 / 35
  • 15. Randomized regression in RAM: Implementations Avron, Maymounkov, and Toledo, SISC, 32, 1217–1236, 2010. Conclusions: Randomized algorithms “beats Lapack’s direct dense least-squares solver by a large margin on essentially any dense tall matrix.” These results “suggest that random projection algorithms should be incorporated into future versions of Lapack.” Mahoney (Stanford) Randomized Regression November 2013 15 / 35
  • 16. Randomized regression in RAM: Human Genetics Paschou et al., PLoS Gen ’07; Paschou et al., J Med Gen ’10; Drineas et al., PLoS ONE ’10; Javed et al., Ann Hum Gen ’11. Computing large rectangular regressions/SVDs/CUR decompositions: In commodity hardware (e.g., a 4GB RAM, dual-core laptop), using MatLab 7.0 (R14), the computation of the SVD of the dense 2, 240 × 447, 143 matrix A takes about 20 minutes. Computing this SVD is not a one-liner—we can not load the whole matrix in RAM (runs out-of-memory in MatLab). Instead, compute the SVD of AAT. In a similar experiment, compute 1,200 SVDs on matrices of dimensions (approx.) 1, 200 × 450, 000 (roughly, a full leave-one-out cross-validation experiment). Mahoney (Stanford) Randomized Regression November 2013 16 / 35
  • 17. A retrospective Randomized matrix algorithms: BIG success story in high precision scientific computing applications and large-scale statistical data analysis! Can they really be implemented in parallel and distributed environments for LARGE-scale statistical data analysis? Mahoney (Stanford) Randomized Regression November 2013 17 / 35
  • 18. Outline 1 Randomized regression in RAM 2 Solving 2 regression using MPI 3 Solving 1 and quantile regression on MapReduce
  • 19. Algorithm LSRN (for strongly over-determined systems) (Meng, Saunders, and Mahoney 2011) Perform a Gaussian random projection 2: Construct a preconditioner from the subsample 3: Iterate with a “traditional” iterative algorithm 1: Why a Gaussian random projection? Since it provides the strongest results for conditioning, uses level 3 BLAS on dense matrices and can be generated super fast, speeds up automatically on sparse matrices and fast operators, still works (with an extra “allreduce” operation) when A is partitioned along its bigger dimension. Although it is “slow” (compared with “fast” Hadamard-based projections i.t.o. FLOPS), it allows for better communication properties. Mahoney (Stanford) Randomized Regression November 2013 19 / 35
  • 20. Implementation of LSRN (Meng, Saunders, and Mahoney 2011) Shared memory (C++ with MATLAB interface) Multi-threaded ziggurat random number generator (Marsaglia and Tsang 2000), generating 109 numbers in less than 2 seconds on 12 CPU cores. A na¨ implementation of multi-threaded dense-sparse matrix ıve multiplications. Message passing (Python) Single-threaded BLAS for matrix-matrix and matrix-vector products. Multi-threaded BLAS/LAPACK for SVD. Using the Chebyshev semi-iterative method (Golub and Varga 1961) instead of LSQR. Mahoney (Stanford) Randomized Regression November 2013 20 / 35
  • 21. Solving real-world problems matrix landmark rail4284 tnimg 1 tnimg 2 tnimg 3 tnimg 4 tnimg 5 m 71952 4284 951 1000 1018 1019 1023 n 2704 1.1e6 1e6 2e6 3e6 4e6 5e6 nnz 1.15e6 1.1e7 2.1e7 4.2e7 6.3e7 8.4e7 1.1e8 rank 2671 full 925 981 1016 1018 full cond 1.0e8 400.0 - DGELSD 29.54 > 3600 630.6 1291 2084 2945 > 3600 Ab 0.6498∗ 1.203∗ 1067∗ > 3600∗ > 3600∗ > 3600∗ > 3600∗ Blendenpik OOM OOM LSRN 17.55 136.0 36.02 72.05 111.1 147.1 188.5 Table: Real-world problems and corresponding running times. DGELSD doesn’t take advantage of sparsity. Though MATLAB’s backslash may not give the min-length solutions to rank-deficient or under-determined problems, we still report its running times. Blendenpik either doesn’t apply to rank-deficient problems or runs out of memory (OOM). LSRN’s running time is mainly determined by the problem size and the sparsity. Mahoney (Stanford) Randomized Regression November 2013 21 / 35
  • 22. Iterating with LSQR (Paige and Saunders 1982) Code snippet (Python): u = A . matvec ( v ) − a l p h a ∗u b e t a = s q r t (comm . a l l r e d u c e ( np . d o t ( u , u ) ) ) ... v = comm . a l l r e d u c e (A . r m a t v e c ( u ) ) − b e t a ∗ v Cost per iteration: two matrix-vector multiplications two cluster-wide synchronizations Mahoney (Stanford) Randomized Regression November 2013 22 / 35
  • 23. Iterating with Chebyshev semi-iterative (CS) method (Golub and Varga 1961) The strong concentration results on σ max (AN) and σ min (AN) enable use of the CS method, which requires an accurate bound on the extreme singular values to work efficiently. Code snippet (Python): v = comm . a l l r e d u c e (A . r m a t v e c ( r ) ) − b e t a ∗ v x += a l p h a ∗ v r −= a l p h a ∗A . matvec ( v ) Cost per iteration: two matrix-vector multiplications one cluster-wide synchronization Mahoney (Stanford) Randomized Regression November 2013 23 / 35
  • 24. LSQR vs. CS on an Amazon EC2 cluster (Meng, Saunders, and Mahoney 2011) solver LSRN w/ CS LSRN w/ LSQR LSRN w/ CS LSRN w/ LSQR LSRN w/ CS LSRN w/ LSQR LSRN w/ CS LSRN w/ LSQR Nnodes Nprocesses m n nnz 2 4 1024 4e6 8.4e7 5 10 1024 1e7 2.1e8 10 20 1024 2e7 4.2e8 20 40 1024 4e7 8.4e8 Niter 106 84 106 84 106 84 106 84 Titer 34.03 41.14 50.37 68.72 73.73 102.3 102.5 137.2 Ttotal 170.4 178.6 193.3 211.6 220.9 249.0 255.6 290.2 Table: Test problems on an Amazon EC2 cluster and corresponding running times in seconds. Though the CS method takes more iterations, it actually runs faster than LSQR by making only one cluster-wide synchronization per iteration. Mahoney (Stanford) Randomized Regression November 2013 24 / 35
  • 25. Outline 1 Randomized regression in RAM 2 Solving 2 regression using MPI 3 Solving 1 and quantile regression on MapReduce
  • 26. Comparison of three types of regression problems 2 estimation regression mean loss function z2 formulation is a norm? Ax − b yes 1 regression median |z| 2 2 Ax − b yes quantile regression quantile τ ρτ (x) = τ z, (τ − 1)z, z ≥ 0; z < 0. ρτ (Ax − b) no 1 These measure different functions of the response variable given certain values of the predictor variables. L2 regression L1 regression 1 Quantile regression 1 0.8 0.6 0.5 0.5 0.4 0.2 0 −1 Mahoney (Stanford) 0 1 0 −1 0 Randomized Regression 1 0 −1 0 1 November 2013 26 / 35
  • 27. “Everything generalizes” from 2 regression to (But “everything generalizes messily” since 1 1 is “worse” that regression 2 .) A matrix U ∈ Rm×n is (α, β, p = 1)-conditioned if |U|1 ≤ α and x ∞ ≤ β Ux 1 , ∀x; and 1 -well-conditioned basis if α, β = poly(n). Define the 1 leverage scores of an m × n matrix A, with m > n, as the 1 -norms-squared of the rows of an 1 -well-conditioned basis of A. Define the 1 -norm condition number of A, denoted by κ1 (A), as: κ1 (A) = min This implies: σ1 (A) x Mahoney (Stanford) max max x σ1 (A) = min (A) min x σ1 2 ≤ Ax 1 2 =1 2 =1 Ax Ax max ≤ σ1 (A) x Randomized Regression 1 . 1 2, ∀x ∈ Rn . November 2013 27 / 35
  • 28. Making 1 and quantile work to low and high precision Finding a good basis (to get a low-precision solution): name SCT[SW11] FCT [CDMMMW13] Ellipsoid rounding [Cla05] Fast ER [CDMMMW13] running time O(mn2 log n) O(mn log n) O(mn5 log m)) O(mn3 log m)) κ O(n5/2 log3/2 m) 7/2 O(n log5/2 m) n3/2 (n + 1)1/2 2n2 SPC1 [MM13] SPC2 [MM13] O(nnz(A) · log m) O(nnz(A) · log m) + ER small O(nnz(A) · log m) + QR small SPC3 [YMM13] type QR QR ER ER passes (for soln) 2 2 n4 n2 O(n 2 log 2 n) 6n2 13 11 QR QR+ER 2 3 19 O(n 4 11 log 4 QR+QR 3 n) Iteratively solving (to get more digits of precision): subgradient (Clarkson 2005) gradient (Nesterov 2009) ellipsoid (Nemirovski and Yudin 1972) inscribed ellipsoids (Tarasov, Khachiyan, and Erlikh 1988) Mahoney (Stanford) passes O(n4 / 2 ) O(m1/2 / ) O(n2 log(κ1 / )) O(n log(κ1 / )) Randomized Regression extra work per pass — — — O(n7/2 log n) November 2013 28 / 35
  • 29. Prior work and evaluations Evaluate on real and simulated data: Simulated data, size ca. 109 × 102 , designed to have “bad” nonuniformities. Real US Census data, size ca. 107 × 10, or “stacked” to size ca. 1010 × 10. 0.13 0 −0.05 Education Unmarried 0.12 0.11 0.1 Solution of LS regression Solution of LAD regression Solution of Quantile regression Approximate solution of Quantile 90% confidence intervals −0.1 −0.15 0.09 0.08 0 0.5 Quantile (a) Unmarried 1 −0.2 0 0.5 1 Quantile (b) Education (c) Legend State of the art (due to Portnoy-Koenker, 1997]): Standard solver for quantile regression is interior-point method ipm, applicable for 106 × 50-sized problems. Best previous sampling algorithm for quantile regression, prqfn, uses an interior-point method on a smaller randomly-constructed subproblem. Mahoney (Stanford) Randomized Regression November 2013 29 / 35
  • 30. A MapReduce implementation Inputs: A ∈ Rm×n and κ1 such that x 2 ≤ Ax 1 ≤ κ1 x 2, ∀x, c ∈ Rn , sample size s, and number of subsampled solutions nx . Mapper: 1 2 For each row ai of A, let pi = min{s ai 1 /(κ1 n1/2 ), 1}. For k = 1, . . . , nx , emit (k, ai /pi ) with probability pi . Reducer: 1 2 3 Collect row vectors associated with key k and assemble Ak . Compute xk = arg minc T x=1 Ak x 1 using interior-point methods. ˆ Return xk . ˆ Note that multiple subsampled solutions can be computed in a single pass. Mahoney (Stanford) Randomized Regression November 2013 30 / 35
  • 31. Evaluation on large-scale regression problem 1 0 10 cauchy gaussian nocd unif −1 |xj − x*| j 10 −2 10 −3 10 2 4 6 8 index 10 12 14 First (solid) and the third (dashed) quartiles of entry-wise absolute errors (on synthetic data that has “bad” nonuniformities). CT (Cauchy) GT (Gaussian) NOCD UNIF x − x ∗ 1/ x ∗ 1 [0.008, 0.0115] [0.0126, 0.0168] [0.0823, 22.1] [0.0572, 0.0951] x − x ∗ 2/ x ∗ 2 [0.00895, 0.0146] [0.0152, 0.0232] [0.126, 70.8] [0.089, 0.166] x − x ∗ ∞/ x ∗ ∞ [0.0113, 0.0211] [0.0184, 0.0366] [0.193, 134] [0.129, 0.254] First and the third quartiles of relative errors in 1-, 2-, and ∞-norms on a data set of size 1010 × 15. CT (and FCT) clearly performs the best. GT is worse but follows closely. NOCD and UNIF are much worse. (Similar results for size 1010 × 100 if SPC2 is used.) Mahoney (Stanford) Randomized Regression November 2013 31 / 35
  • 32. The Method of Inscribed Ellipsoids (MIE) MIE works similarly to the bisection method, but in a higher dimension. Why do we choose MIE? Least number of iterations Initialization using all the subsampled solutions Multiple queries per iteration At each iteration, we need to compute (1) a function value and (2) a gradient/subgradient. For each subsampled solution, we have a hemisphere that contains the optimal solution. We use all these solution hemispheres to construct initial search region. Mahoney (Stanford) Randomized Regression November 2013 32 / 35
  • 33. Computing multiple f and g in a single pass On MapReduce, the IO cost may dominate the computational cost, which requires algorithms that could do more computation in a single pass. Single query: f (x) = Ax 1, g (x) = AT sign(Ax). Multiple queries: F (X ) = sum(|AX |, 0), G (X ) = AT sign(AX ). An example on a 10-node Hadoop cluster: A : 108 × 50, 118.7GB. A single query: 282 seconds. 100 queries in a single pass: 328 seconds. Mahoney (Stanford) Randomized Regression November 2013 33 / 35
  • 34. MIE with sampling initialization and multiple queries 4 4 10 10 3 10 2 10 2 10 0 10 1 10 standard IPCPM −2 0 proposed IPCPM 10 mie −4 (f−f*)/f* relative error 10 mie w/ multi q 10 mie w/ sample init mie w/ sample init and multi q −1 10 −2 10 −6 10 −3 10 −8 10 −4 10 −10 10 −5 10 −12 10 −6 0 10 20 30 40 50 60 number of iterations 70 80 90 100 10 6 (d) size: 10 × 20 5 10 20 25 30 (e) size: 5.24e9 × 15 Comparing different MIE methods on large/LARGE Mahoney (Stanford) 15 number of iterations Randomized Regression 1 regression problem. November 2013 34 / 35
  • 35. Conclusion Randomized regression in parallel & distributed environments: different design principles for high-precision versus low-precision Least Squares Approximation Least Absolute Deviations Quantile Regression Algorithms require more computation than traditional matrix algorithms, but they have better communication profiles. On MPI: Chebyshev semi-iterative method vs. LSQR. On MapReduce: Method of inscribed ellipsoids with multiple queries. Look beyond FLOPS in parallel and distributed environments. Mahoney (Stanford) Randomized Regression November 2013 35 / 35