Deep Learning For Practitioners, lecture 2: Selecting the right applications deep learning

Deep
Learning
For
Prac//oners

Lecture
2:
Which
applica/ons
beneﬁt
from

deep
learning?

Anantharaman
Narayana
Iyer

deeplearning.ananth@gmail.com

17th
June
2014

Note:
Notes
that
contain
code
examples
for
these
slides
and

detailed
analysis
will
be
published
separately
later.

Review
of
previous
lecture

•  Deep
learning
as
a
major
machine
learning
discipline

has
received
phenomenal
aNen/on
of
late
due
to:

–  Breakthrough
results
reported
by
the
research

community

for
certain
class
of
applica/ons,
beNering

the
current
state
of
the
art

–  Substan/al
investments
by
technology
companies

such
as:
Google,
Facebook,
MicrosoU,
IBM

•  While
there
is
no
single
unique
architecture,
deep

networks
are
typically
built
using
some
variant
of

Autoencoders
or
Restricted
Boltzmann
Machines

with
key
characteris/cs
of:

–  Deep
architecture:
Mul/ple
layers
performing

complex,
nonlinear
computa/ons,
cascading
the

layerwise
outputs.

–  Automated
feature
extrac/on:
each
layer
produces
as

its
output
an
abstracted
form
of
its
inputs
(e.g.
Edges

from
raw
pixels).
One
may
add
a
classiﬁer
layer
(e.g

SVM)
on
top
of
the
abstracted
features
and
can
view

the
classiﬁca/on
as
being
done
on
the
most
abstract

features
automa/cally
generated
by
the
system.
(An

example
with
code
illustrated
in
the
next
lecture)

Looking
through
the
prac//oner’s
prism

•  To
address
real
world
problems,

prac//oners
need
to
be
aware
of
where

deep
learning
yields
best
results,
prac/cal

considera/ons,
limita/ons
and
when
not
to

use
it.

•  This
requires
looking
at
the
research
results

and
other
claims
from
a
prac/cal

perspec/ve
and
stay
clear
of
common

misconcep/ons.

“If
all
you
have
is
a
hammer
everything
looks
as
a
nail”

•  Deep
learning
has
proved
its
poten/al
in
some
applica/on
domains
(e.g.

Computer
Vision,
Speech
recogni/on),
holds
early
promise
in
several
other
areas

(e.g
Natural
Language
Processing)
but
this
is
not
a
universal
tool
to
provide
the

best
result
for
“any”
AI
task.

•  When
does
it
have
the
poten/al
to
perform
best?

–  When
structure
of
the
problem
being
solved
naturally
maps
to
a
mul/
layer

architecture

•  If
the
problem
we
are
trying
to
solve
can
be
decomposed
in
to
processing
hierarchical

abstract
features
and
these
features
are
derivable
from
the
input
data
through
a
set
of

poten/ally
nonlinear
transforma/ons,
deep
learning
based
solu/on
might
be
eﬀec/ve.

•  As
a
corollary,
problems
that
don’t
exhibit
a
mul/
layer
structure
may
not
see
much

incremental
beneﬁt
compared
to
tradi/onal
methods

–  Data
availability

•  While
tradi/onal
architectures
require
expert
designed
features,
deep
learning
systems

automa/cally
learn
these
features,
given
the
raw
input.

•  In
order
to
learn
the
features,
extensive,
unsupervised
pretraining
using
large
volumes
of

data
is
oUen
required.
Hence
any
advanced
solu/on
based
on
deep
learning
is
likely
to

require
availability
of
such
data.

“More
data
or
beNer
models?”

•  Data
Vs
Algorithm:
research
shows
that

training
a
system
with
more
data,
the

performance
asympto/cally
approaches

same
levels
regardless
of
the
model.

•  One
may
be
led
to
believe
that
shallow

networks,
trained
with
huge
data
might

equal
the
performance
of
deep
networks.

–  Unfortunately,
much
of
the
available
data
in
the

web
is
unlabeled
and
without
an
eﬀec/ve

unsupervised
training
model,
the
data
is
not

useful.
Deep
networks
with
unsupervised

pretraining
phase,
can
leverage
the
data
beNer.

•  Another
no/on
could
be
that
any

algorithm
or
model
selec/on
for
a
deep

network
is
good
enough
if
you
give
it
a

huge
volume
of
data.

–  Choosing
an
op/mal
algorithm
and
design
is

very
cri/cal
as
deep
networks
are
resource

heavy
due
to
mul/ple
layers
and
weights.
A

good
intui/on
on
the
problem
structure
is

important
to
make
right
choices
of
the
model.

Automated
Feature
Learning
and
data
preprocessing

Though
deep
learning
systems
extract
features
automa/cally,
the
task
of
data

preprocessing
is
s/ll
non-‐trivial.

–  The
input
data
should
be
complete
enough
so
that
the
features
relevant
for
the
given

problem
can
be
extracted.

•  Consider
the
example
of
detec/ng
anomalies
in
the
opera/on
of
a
nuclear
reactor.
The

input
to
be
given
to
a
deep
learning
system
should
include
signals
from
all
the
relevant

sensors
and
missing
any
of
them
may
result
in
inadequate
performance

–  The
op/mum
size
of
the
input
data
adequate
for
the
job
needs
to
be
determined.

•  Suppose
we
need
to
perform
face
detec/on,
given
the
input
images.
What
should
be
the

right
input
size?
Should
it
be
10
x
10
or
100
x
100
pixels?
High
dimensionality
increases
the

model
parameters
substan/ally,
requiring
more
compute
resources.

–  Input
vector
representa/on
must
be
determined

•  E.g,
for
an
NLP
problem,
words
from
a
vocabulary
V
may
be
represented
in
“one-‐hot”
form

where
each
word
in
V
is
represented
by
a
posi/on.
Here,
the
number
of
features
for
a
given

word
w
equals
the
size
of

the
vocabulary
|V|
and
a
sentence
with
k
words
will
be

represented
as
k
*
|V|
sized
input
vector.
When
the
size
of
vocabulary
becomes
large
(say

over
10000
words),
this
representa/on
increases
the
dimensionality
substan/ally.

–  For
many
problems,
data
cleaning
and
preprocessing
are
s/ll
required

•  E.g.
For
many
NLP
problems,
beNer
performance
may
be
obtained
easier
through
some

preprocessing
steps
(such
as
stopword
removal,
stemming
etc)
rather
than
lehng
the
deep

learning
system
handle
the
data
in
its
raw
form.

Deep Learning For Practitioners, lecture 2: Selecting the right applications deep learning

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Deep Learning For Practitioners, lecture 2: Selecting the right applications deep learning

Semelhante a Deep Learning For Practitioners, lecture 2: Selecting the right applications deep learning (20)

Mais de ananth

Mais de ananth (10)

Último

Último (20)

Deep Learning For Practitioners, lecture 2: Selecting the right applications deep learning