In this presentation we articulate when deep learning techniques yield best results from a practitioner's view point. Do we apply deep learning techniques for every machine learning problem? What characteristics of an application lends itself suitable for deep learning? Does more data automatically imply better results regardless of the algorithm or model? Does "automated feature learning" obviate the need for data preprocessing and feature design?
Deep Learning For Practitioners, lecture 2: Selecting the right applications deep learning
1. Deep
Learning
For
Prac//oners
Lecture
2:
Which
applica/ons
benefit
from
deep
learning?
Anantharaman
Narayana
Iyer
deeplearning.ananth@gmail.com
17th
June
2014
Note:
Notes
that
contain
code
examples
for
these
slides
and
detailed
analysis
will
be
published
separately
later.
2. Review
of
previous
lecture
• Deep
learning
as
a
major
machine
learning
discipline
has
received
phenomenal
aNen/on
of
late
due
to:
– Breakthrough
results
reported
by
the
research
community
for
certain
class
of
applica/ons,
beNering
the
current
state
of
the
art
– Substan/al
investments
by
technology
companies
such
as:
Google,
Facebook,
MicrosoU,
IBM
• While
there
is
no
single
unique
architecture,
deep
networks
are
typically
built
using
some
variant
of
Autoencoders
or
Restricted
Boltzmann
Machines
with
key
characteris/cs
of:
– Deep
architecture:
Mul/ple
layers
performing
complex,
nonlinear
computa/ons,
cascading
the
layerwise
outputs.
– Automated
feature
extrac/on:
each
layer
produces
as
its
output
an
abstracted
form
of
its
inputs
(e.g.
Edges
from
raw
pixels).
One
may
add
a
classifier
layer
(e.g
SVM)
on
top
of
the
abstracted
features
and
can
view
the
classifica/on
as
being
done
on
the
most
abstract
features
automa/cally
generated
by
the
system.
(An
example
with
code
illustrated
in
the
next
lecture)
3. Looking
through
the
prac//oner’s
prism
• To
address
real
world
problems,
prac//oners
need
to
be
aware
of
where
deep
learning
yields
best
results,
prac/cal
considera/ons,
limita/ons
and
when
not
to
use
it.
• This
requires
looking
at
the
research
results
and
other
claims
from
a
prac/cal
perspec/ve
and
stay
clear
of
common
misconcep/ons.
4. “If
all
you
have
is
a
hammer
everything
looks
as
a
nail”
• Deep
learning
has
proved
its
poten/al
in
some
applica/on
domains
(e.g.
Computer
Vision,
Speech
recogni/on),
holds
early
promise
in
several
other
areas
(e.g
Natural
Language
Processing)
but
this
is
not
a
universal
tool
to
provide
the
best
result
for
“any”
AI
task.
• When
does
it
have
the
poten/al
to
perform
best?
– When
structure
of
the
problem
being
solved
naturally
maps
to
a
mul/
layer
architecture
• If
the
problem
we
are
trying
to
solve
can
be
decomposed
in
to
processing
hierarchical
abstract
features
and
these
features
are
derivable
from
the
input
data
through
a
set
of
poten/ally
nonlinear
transforma/ons,
deep
learning
based
solu/on
might
be
effec/ve.
• As
a
corollary,
problems
that
don’t
exhibit
a
mul/
layer
structure
may
not
see
much
incremental
benefit
compared
to
tradi/onal
methods
– Data
availability
• While
tradi/onal
architectures
require
expert
designed
features,
deep
learning
systems
automa/cally
learn
these
features,
given
the
raw
input.
• In
order
to
learn
the
features,
extensive,
unsupervised
pretraining
using
large
volumes
of
data
is
oUen
required.
Hence
any
advanced
solu/on
based
on
deep
learning
is
likely
to
require
availability
of
such
data.
5. “More
data
or
beNer
models?”
• Data
Vs
Algorithm:
research
shows
that
training
a
system
with
more
data,
the
performance
asympto/cally
approaches
same
levels
regardless
of
the
model.
• One
may
be
led
to
believe
that
shallow
networks,
trained
with
huge
data
might
equal
the
performance
of
deep
networks.
– Unfortunately,
much
of
the
available
data
in
the
web
is
unlabeled
and
without
an
effec/ve
unsupervised
training
model,
the
data
is
not
useful.
Deep
networks
with
unsupervised
pretraining
phase,
can
leverage
the
data
beNer.
• Another
no/on
could
be
that
any
algorithm
or
model
selec/on
for
a
deep
network
is
good
enough
if
you
give
it
a
huge
volume
of
data.
– Choosing
an
op/mal
algorithm
and
design
is
very
cri/cal
as
deep
networks
are
resource
heavy
due
to
mul/ple
layers
and
weights.
A
good
intui/on
on
the
problem
structure
is
important
to
make
right
choices
of
the
model.
6. Automated
Feature
Learning
and
data
preprocessing
Though
deep
learning
systems
extract
features
automa/cally,
the
task
of
data
preprocessing
is
s/ll
non-‐trivial.
– The
input
data
should
be
complete
enough
so
that
the
features
relevant
for
the
given
problem
can
be
extracted.
• Consider
the
example
of
detec/ng
anomalies
in
the
opera/on
of
a
nuclear
reactor.
The
input
to
be
given
to
a
deep
learning
system
should
include
signals
from
all
the
relevant
sensors
and
missing
any
of
them
may
result
in
inadequate
performance
– The
op/mum
size
of
the
input
data
adequate
for
the
job
needs
to
be
determined.
• Suppose
we
need
to
perform
face
detec/on,
given
the
input
images.
What
should
be
the
right
input
size?
Should
it
be
10
x
10
or
100
x
100
pixels?
High
dimensionality
increases
the
model
parameters
substan/ally,
requiring
more
compute
resources.
– Input
vector
representa/on
must
be
determined
• E.g,
for
an
NLP
problem,
words
from
a
vocabulary
V
may
be
represented
in
“one-‐hot”
form
where
each
word
in
V
is
represented
by
a
posi/on.
Here,
the
number
of
features
for
a
given
word
w
equals
the
size
of
the
vocabulary
|V|
and
a
sentence
with
k
words
will
be
represented
as
k
*
|V|
sized
input
vector.
When
the
size
of
vocabulary
becomes
large
(say
over
10000
words),
this
representa/on
increases
the
dimensionality
substan/ally.
– For
many
problems,
data
cleaning
and
preprocessing
are
s/ll
required
• E.g.
For
many
NLP
problems,
beNer
performance
may
be
obtained
easier
through
some
preprocessing
steps
(such
as
stopword
removal,
stemming
etc)
rather
than
lehng
the
deep
learning
system
handle
the
data
in
its
raw
form.