SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
Top	
  10	
  13	
  Deep	
  Learning	
  Tips	
  &	
  Tricks
Arno	
  Candel,	
  PhD	
  
Chief	
  Architect,	
  H2O.ai	
  
@ArnoCandel
Tip	
  #1:	
  Understand	
  Model	
  Complexity
Age
Income
FICO	
  score
Fraud
Legit
Layer	
  1	
  
hidden	
  
neurons Layer	
  2	
  
hidden	
  	
  
neurons
• Neurons	
  are	
  connected	
  layer-­‐by-­‐layer,	
  feed-­‐forward,	
  directed	
  acyclic	
  graph	
  
• Each	
  connection	
  is	
  a	
  floating-­‐point	
  number	
  (weight)	
  
• Neuron	
  activation/output	
  is	
  non-­‐linear	
  transform	
  of	
  weighted	
  sum	
  of	
  inputs	
  
• Model	
  complexity	
  (training	
  time)	
  grows	
  with	
  total	
  number	
  of	
  weights	
  
• Training	
  is	
  iterative:	
  can	
  stop	
  at	
  any	
  point	
  and	
  continue	
  until	
  convergence	
  
• Multi-­‐threading	
  leads	
  to	
  race	
  conditions,	
  results	
  are	
  not	
  reproducible
[4,3]	
  hidden	
  neurons
CPU	
  intense
Tip	
  #1:	
  Understand	
  Model	
  Complexity
Given	
  data:	
  10	
  billion	
  rows,	
  1111	
  columns	
  
1110	
  predictor	
  columns	
  (1100	
  numerical,	
  10	
  categorical	
  
with	
  400	
  levels	
  each),	
  1	
  binary	
  response	
  column
Question:	
  Which	
  model	
  is	
  more	
  complex	
  (i.e.,	
  is	
  bigger,	
  has	
  more	
  weights)?	
  
Model	
  1:	
  	
  	
  	
  [400,400,400,400]	
  hidden	
  neurons	
  
Model	
  2:	
  	
  	
  	
  [500]	
  hidden	
  neurons
Tip	
  #1:	
  Understand	
  Model	
  Complexity
Model	
  1:	
  [5100,400,400,400,400,2]	
  neurons	
  and	
  
5100x400	
  +	
  400x400	
  +	
  400x400	
  +	
  400x400	
  +	
  400x2	
  =	
  2.52M	
  weights
Model	
  2:	
  [5100,500,2]	
  neurons	
  and	
  5100x500	
  +	
  500x2	
  =	
  2.55M	
  weights
Answer:	
  Model	
  2.	
  Why?	
  
Categoricals	
  are	
  one-­‐hot	
  expanded	
  into	
  4000	
  “dummy”	
  variables	
  (0/1),	
  
and	
  hence	
  there	
  are	
  a	
  total	
  of	
  1100	
  +	
  4000	
  =	
  5100	
  input	
  neurons.
Model	
  size	
  depends	
  on	
  features
Given	
  data:	
  10	
  billion	
  rows,	
  1111	
  columns	
  
1110	
  predictor	
  columns	
  (1100	
  numerical,	
  10	
  categorical	
  
with	
  400	
  levels	
  each),	
  1	
  binary	
  response	
  column
Question:	
  How	
  much	
  memory	
  does	
  H2O	
  Deep	
  Learning	
  need	
  to	
  train	
  
such	
  a	
  model	
  (2.5M	
  weights)	
  on	
  10	
  billion	
  rows	
  for	
  100	
  epochs?
Answer:	
  Model	
  memory	
  usage	
  is	
  small	
  and	
  constant	
  over	
  time:

2.5M	
  x	
  4Bytes	
  (float)	
  x	
  3	
  (adaptive	
  learning	
  rate)	
  =	
  30MB	
  
(more	
  training	
  -­‐>	
  hopefully	
  better	
  weights,	
  but	
  not	
  more	
  weights)
Tip	
  #1:	
  Understand	
  Model	
  Complexity
Model	
  size	
  independent	
  of	
  #rows	
  or	
  training	
  time
Given	
  data:	
  10	
  billion	
  rows,	
  1111	
  columns	
  
1110	
  predictor	
  columns	
  (1100	
  numerical,	
  10	
  categorical	
  
with	
  400	
  levels	
  each),	
  1	
  binary	
  response	
  column
Tip	
  #2:	
  Establish	
  a	
  Baseline	
  on	
  Holdout	
  Data
For	
  a	
  given	
  dataset	
  (with	
  proper	
  holdout	
  sets),	
  do	
  the	
  following	
  first:	
  
•Train	
  default	
  GLM,	
  DRF,	
  GBM	
  models	
  and	
  inspect	
  convergence	
  plots,	
  
train/test	
  metrics	
  and	
  variable	
  importances	
  
•Train	
  default	
  DL	
  models,	
  but	
  set	
  hidden	
  neurons	
  (and	
  maybe	
  epochs):	
  
-­‐ hidden=[200,200]	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  “I	
  Feel	
  Lucky”	
  
-­‐ hidden=[512]	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  “Eagle	
  Eye”	
  
-­‐ hidden=[64,64,64]	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  “Puppy	
  Brain”	
  
-­‐ hidden=[32,32,32,32,32]	
  	
  “Junior	
  Chess	
  Master”	
  
• Develop	
  a	
  feel	
  for	
  the	
  problem	
  and	
  the	
  holdout	
  performance	
  of	
  the	
  
different	
  models
Develop	
  a	
  Feel
Tip	
  #3:	
  Inspect	
  Models	
  in	
  Flow
• Inspect	
  the	
  model	
  in	
  Flow	
  during	
  training:	
  getModel	
  ‘model_id’	
  
• If	
  the	
  model	
  is	
  wrong	
  (wrong	
  architecture,	
  response,	
  parameters,	
  etc.),	
  cancel	
  it	
  
• If	
  the	
  model	
  is	
  taking	
  too	
  long,	
  cancel	
  and	
  decrease	
  model	
  complexity	
  
• If	
  the	
  model	
  is	
  performing	
  badly,	
  cancel	
  and	
  increase	
  model	
  complexity
Cancel	
  early,	
  cancel	
  often
Tip	
  #4:	
  Use	
  Early	
  Stopping!	
  (On	
  by	
  default)
Before:	
  trains	
  too	
  long,	
  but	
  at	
  least	
  
overwrite_with_best_model=true	
  
prevents	
  overfitting	
  (returns	
  the	
  
model	
  with	
  lowest	
  validation	
  error)
Now:	
  specify	
  additional	
  convergence	
  criterion:	
  E.g.	
  
stopping_rounds=5,	
  stopping_metric=“MSE”,	
  
stopping_tolerance=1e-­‐3,	
  to	
  stop	
  as	
  soon	
  as	
  the	
  moving	
  
average	
  (length	
  5)	
  of	
  the	
  validation	
  MSE	
  does	
  not	
  
improve	
  by	
  at	
  least	
  0.1%	
  for	
  5	
  consecutive	
  scoring	
  events
validation	
  error
training	
  error
overwrite_with_best_model=true
training	
  time	
  /	
  epochs
training	
  time	
  /	
  epochsUse	
  Flow	
  to	
  inspect	
  the	
  model
Early	
  stopping	
  saves	
  tons	
  of	
  time
Best	
  Model
Higgs	
  dataset
Tip	
  #5:	
  Control	
  Scoring	
  Overhead
By	
  default,	
  scoring	
  is	
  limited	
  to	
  10%	
  of	
  the	
  runtime,	
  but	
  if	
  you	
  
provide	
  a	
  large	
  validation	
  dataset,	
  then	
  even	
  scoring	
  for	
  the	
  first	
  time	
  
can	
  take	
  a	
  long	
  time.	
  Here	
  are	
  options	
  to	
  control	
  this	
  overhead:	
  
•Provide	
  a	
  smaller	
  validation	
  dataset	
  for	
  scoring	
  during	
  training	
  
•Sample	
  the	
  validation	
  dataset	
  with	
  ‘score_validation_samples=N’	
  
• If	
  multi-­‐class	
  or	
  imbalanced,	
  set	
  ‘score_validation_sampling=“Stratified”’	
  
•Score	
  less	
  often	
  with	
  ‘score_duty_cycle=0.05’	
  etc.	
  
The	
  quality	
  of	
  the	
  validation	
  dataset	
  and	
  the	
  frequency	
  of	
  scoring	
  can	
  
affect	
  early	
  stopping.	
  Especially	
  if	
  N-­‐fold	
  cross-­‐validation	
  is	
  disabled,	
  
try	
  to	
  provide	
  a	
  validation	
  dataset	
  for	
  accurate	
  early	
  stopping.
Validation	
  data	
  determines	
  scoring	
  speed	
  and	
  early	
  stopping	
  decision
Tip	
  #6:	
  Use	
  N-­‐fold	
  Cross-­‐Validation
Estimate	
  your	
  model	
  performance	
  well
In	
  H2O,	
  N-­‐fold	
  Cross-­‐Validation	
  trains	
  N+1	
  models:	
  
• N	
  models	
  with	
  each	
  a	
  different	
  1/N-­‐th	
  of	
  the	
  training	
  data	
  held	
  out	
  
as	
  validation	
  data	
  
• 1	
  model	
  on	
  the	
  full	
  training	
  data	
  
Early	
  stopping	
  and	
  N-­‐fold	
  CV	
  are	
  especially	
  useful	
  together,	
  as	
  the	
  
optimal	
  number	
  of	
  epochs	
  is	
  estimated	
  from	
  the	
  N	
  Cross-­‐Validation	
  
models	
  (even	
  if	
  a	
  validation	
  dataset	
  is	
  provided).	
  
•Early	
  stopping	
  automatically	
  tunes	
  the	
  number	
  of	
  epochs	
  
•Model	
  performance	
  is	
  estimated	
  with	
  training	
  holdout	
  data	
  
•The	
  model	
  is	
  built	
  on	
  all	
  of	
  the	
  training	
  data
Tip	
  #7:	
  Use	
  Regularization
Overfitting	
  is	
  easy,	
  generalization	
  is	
  art
validation	
  error
training	
  error
training	
  time training	
  time
Rectifier,	
  hidden=[50,50]
RectifierWithDropout,	
  hidden=[50,50],	
  
l1=1e-­‐4,	
  l2=1e-­‐4,	
  hidden_dropout_ratio=[0.2,0.3]
Higgs	
  dataset
Tip	
  #8:	
  Perform	
  HyperParameter	
  Search
Just	
  need	
  to	
  find	
  one	
  of	
  the	
  many	
  good	
  models
There	
  are	
  a	
  lot	
  of	
  hyper	
  parameters	
  for	
  H2O	
  Deep	
  Learning.	
  Many	
  are	
  
for	
  expert-­‐level	
  fine	
  control	
  and	
  do	
  not	
  significantly	
  affect	
  the	
  model	
  
performance.	
  Rectifier	
  /	
  RectifierWithDropout	
  is	
  recommended	
  for	
  
most	
  problems	
  (The	
  only	
  difference	
  is	
  the	
  default	
  values	
  for	
  
hidden_dropout_ratios:	
  0	
  /	
  0.5).	
  
Main	
  parameters	
  to	
  tune	
  (I	
  prefer	
  Random	
  over	
  Grid	
  search)	
  are:	
  
•hidden	
  (try	
  2	
  to	
  5	
  layers	
  deep,	
  10	
  to	
  2000	
  neurons	
  per	
  layer)	
  
•hidden_dropout_ratios	
  and	
  input_dropout_ratio	
  
•l1/l2	
  
•adaptive_rate	
  
•true:	
  	
  rho,	
  epsilon	
  
•false:	
  rate,	
  rate_annealing,	
  rate_decay,	
  momentum_start,	
  
momentum_stable,	
  momentum_ramp
Tip	
  #9:	
  Use	
  Checkpointing
Any	
  H2O	
  Deep	
  Learning	
  model	
  can	
  be	
  stored	
  to	
  disk	
  and	
  then	
  later	
  restarted,	
  
with	
  either	
  the	
  same	
  or	
  new	
  training	
  data.	
  
•It’s	
  ok	
  to	
  cancel	
  a	
  H2O	
  Deep	
  Learning	
  model	
  at	
  any	
  time	
  during	
  training,	
  it	
  
can	
  be	
  restarted	
  
•Store	
  your	
  favorite	
  models	
  to	
  disk	
  for	
  later	
  use	
  
•Continue	
  training	
  from	
  previous	
  model	
  checkpoints	
  (architecture,	
  data	
  
types,	
  etc.	
  must	
  match,	
  many	
  parameters	
  are	
  allowed	
  to	
  change)	
  	
  
•Start	
  multiple	
  new	
  models	
  from	
  the	
  same	
  checkpoint,	
  with	
  different	
  
regularization,	
  learning	
  rate	
  parameters	
  to	
  speed	
  up	
  hyper-­‐parameter	
  search
Checkpointing	
  enables	
  fast	
  exploration
Tip	
  #10:	
  Tune	
  Communication	
  on	
  Multi-­‐Node
computation
K-V
K-V
HTTPD
HTTPD
nodes/JVMs
communication
w
w w
w w w w
w1
w3 w2
w4
w2+w4
w1+w3
w* = (w1+w2+w3+w4)/4
initial model: weights and biases w
updated model: w*
H2O K-V store:
distributed 

in-memory non-
blocking hash map
Query & display the
model via JSON, WWW
2
2 4
31
1
1
1
4
3 2
1 2
1
i
Map:	
  training	
  
each	
  node	
  trains	
  
independently,	
  
parallelizes	
  well
Reduce:	
  model	
  averaging	
  
can	
  improve	
  accuracy,	
  but	
  
takes	
  time	
  (limited	
  to	
  
~5%	
  of	
  time	
  by	
  default)
Know	
  your	
  ‘target_ratio_comm_to_comp’	
  value
Multi-­‐Node
Tip	
  #11:	
  Know	
  your	
  Data	
  (Sparsity/Categoricals)
If	
  the	
  number	
  of	
  input	
  neurons	
  is	
  >>	
  1000,	
  then	
  training	
  can	
  be	
  inefficient	
  and	
  slow,	
  
especially	
  if	
  there	
  are	
  many	
  sparse	
  features	
  (e.g.,	
  categorical	
  predictors)	
  that	
  rarely	
  activate	
  
neurons.	
  It	
  might	
  be	
  more	
  efficient	
  to	
  train	
  faster	
  with	
  fewer	
  (dense,	
  rich)	
  predictors.	
  
Try	
  the	
  following	
  remedies:	
  
•Build	
  a	
  tiny	
  DL	
  model	
  and	
  use	
  h2o.deepfeatures()	
  to	
  extract	
  lower-­‐dimensional	
  features	
  
•Ignore	
  categorical	
  columns	
  with	
  high	
  factor	
  counts	
  
•Use	
  random	
  projection	
  of	
  all	
  categorical	
  features	
  into	
  N-­‐dimensional	
  space,	
  by	
  setting	
  
‘max_categorical_features=N’	
  
•Use	
  GLRM	
  to	
  reduce	
  the	
  dimensionality	
  of	
  the	
  dataset	
  (and	
  make	
  sure	
  to	
  apply	
  the	
  same	
  
GLRM	
  model	
  to	
  the	
  test	
  set)	
  
•If	
  the	
  dataset	
  is	
  just	
  simply	
  wide	
  (few	
  or	
  no	
  categorical	
  features),	
  then	
  the	
  chance	
  of	
  it	
  
being	
  sparse	
  is	
  high.	
  If	
  so,	
  set	
  ‘sparse=true’,	
  which	
  can	
  result	
  in	
  faster	
  training.
Know	
  your	
  data
Tip	
  #12:	
  Advanced	
  Math
For	
  regression	
  problems,	
  it	
  might	
  make	
  sense	
  to	
  apply	
  different	
  loss	
  
functions	
  and	
  distributions:	
  
• Gaussian	
  distribution,	
  squared	
  error	
  loss,	
  sensitive	
  to	
  outliers	
  
• Laplace	
  distribution,	
  absolute	
  error	
  loss,	
  more	
  robust	
  to	
  outliers	
  
• Huber	
  loss,	
  combination	
  of	
  squared	
  error	
  &	
  absolute	
  error	
  
• Poisson	
  distribution	
  (e.g.,	
  number	
  of	
  claims	
  in	
  a	
  time	
  period)	
  
• Gamma	
  distribution	
  (e.g,	
  size	
  of	
  insurance	
  claims)	
  
• Tweedie	
  distribution	
  (compound	
  Poisson-­‐Gamma)	
  
Also,	
  H2O	
  Deep	
  Learning	
  supports:	
  
• Offsets	
  (per-­‐row)	
  
• Observation	
  weights	
  (per-­‐row) Know	
  your	
  math
Tip	
  #13:	
  Do	
  Ensembling
To	
  obtain	
  highest	
  accuracy,	
  it’s	
  often	
  more	
  effective	
  to	
  average	
  a	
  few	
  
fast	
  and	
  diverse	
  models	
  than	
  to	
  build	
  one	
  large	
  model.	
  
Ideally,	
  use	
  different	
  network	
  architectures,	
  regularization	
  schemes	
  to	
  
keep	
  the	
  variance	
  high	
  before	
  ensembling.	
  
Given	
  a	
  set	
  of	
  (diverse,	
  and	
  ideally,	
  good	
  or	
  great)	
  models,	
  use	
  
• Blending	
  (just	
  average	
  the	
  models)	
  
• Gut-­‐feel	
  Blending	
  (assign	
  your	
  own	
  weights	
  that	
  add	
  up	
  to	
  1)	
  
• SuperLearner	
  (optimized	
  stacking	
  with	
  meta-­‐learner,	
  see	
  Erin’s	
  talk)	
  
• Add	
  other	
  models	
  into	
  the	
  mix	
  (GBM,	
  DRF,	
  GLM,	
  etc.)
Ensembles	
  rule	
  the	
  leaderboards
Summary:	
  Checklist	
  for	
  Success	
  with	
  H2O	
  Deep	
  Learning
Know	
  your	
  Math
Use	
  Early	
  Stopping
Use	
  Checkpointing
Use	
  Regularization
Use	
  N-­‐fold	
  Cross-­‐Validation
Perform	
  HyperParameter	
  Search
Know	
  your	
  Data	
  (Sparsity/Categoricals)
Tune	
  Communication	
  on	
  Multi-­‐Node
Establish	
  Baseline	
  on	
  Holdout	
  DataUnderstand	
  Model	
  Complexity
Control	
  Scoring	
  Overhead
Do	
  Ensembling
Inspect	
  Models	
  in	
  Flow

Mais conteúdo relacionado

Destaque

Build Deep Learning model to identify santader bank's dissatisfied customers
Build Deep Learning model to identify santader bank's dissatisfied customersBuild Deep Learning model to identify santader bank's dissatisfied customers
Build Deep Learning model to identify santader bank's dissatisfied customers
sriram30691
 
Scalable Data Science and Deep Learning with H2O
Scalable Data Science and Deep Learning with H2OScalable Data Science and Deep Learning with H2O
Scalable Data Science and Deep Learning with H2O
odsc
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
Sri Ambati
 

Destaque (20)

H2O World - Building a Smarter Application - Tom Kraljevic
H2O World - Building a Smarter Application - Tom KraljevicH2O World - Building a Smarter Application - Tom Kraljevic
H2O World - Building a Smarter Application - Tom Kraljevic
 
Statistical computing 01
Statistical computing 01Statistical computing 01
Statistical computing 01
 
H2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal MalohlavaH2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal Malohlava
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 
Kirk' Experimental Design, Chapter 2
Kirk' Experimental Design, Chapter 2Kirk' Experimental Design, Chapter 2
Kirk' Experimental Design, Chapter 2
 
Kirk' Experimental Design, Chapter 1
Kirk' Experimental Design, Chapter 1Kirk' Experimental Design, Chapter 1
Kirk' Experimental Design, Chapter 1
 
H2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientistsH2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientists
 
Build Deep Learning model to identify santader bank's dissatisfied customers
Build Deep Learning model to identify santader bank's dissatisfied customersBuild Deep Learning model to identify santader bank's dissatisfied customers
Build Deep Learning model to identify santader bank's dissatisfied customers
 
R intro 20140716-advance
R intro 20140716-advanceR intro 20140716-advance
R intro 20140716-advance
 
H2O World - Welcome to H2O World with Arno Candel
H2O World - Welcome to H2O World with Arno CandelH2O World - Welcome to H2O World with Arno Candel
H2O World - Welcome to H2O World with Arno Candel
 
H2O Machine Learning and Kalman Filters for Machine Prognostics
H2O Machine Learning and Kalman Filters for Machine PrognosticsH2O Machine Learning and Kalman Filters for Machine Prognostics
H2O Machine Learning and Kalman Filters for Machine Prognostics
 
Intro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LAIntro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LA
 
R intro 20140716-basic
R intro 20140716-basicR intro 20140716-basic
R intro 20140716-basic
 
Hadoop cluster os_tuning_v1.0_20170106_mobile
Hadoop cluster os_tuning_v1.0_20170106_mobileHadoop cluster os_tuning_v1.0_20170106_mobile
Hadoop cluster os_tuning_v1.0_20170106_mobile
 
H2O 3 REST API Overview
H2O 3 REST API OverviewH2O 3 REST API Overview
H2O 3 REST API Overview
 
Scalable Data Science and Deep Learning with H2O
Scalable Data Science and Deep Learning with H2OScalable Data Science and Deep Learning with H2O
Scalable Data Science and Deep Learning with H2O
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
 
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamH2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
 
Statistical computing 03
Statistical computing 03Statistical computing 03
Statistical computing 03
 

Semelhante a H2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel

Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Simplilearn
 
Reinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIReinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAI
Raouf KESKES
 
CODE TUNINGtertertertrtryryryryrtytrytrtry
CODE TUNINGtertertertrtryryryryrtytrytrtryCODE TUNINGtertertertrtryryryryrtytrytrtry
CODE TUNINGtertertertrtryryryryrtytrytrtry
kapib57390
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Databricks
 

Semelhante a H2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel (20)

Dataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdfDataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdf
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
How to win data science competitions with Deep Learning
How to win data science competitions with Deep LearningHow to win data science competitions with Deep Learning
How to win data science competitions with Deep Learning
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
 
Mlcc #4
Mlcc #4Mlcc #4
Mlcc #4
 
StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling framework
 
Regularizing DNN.pptx
Regularizing DNN.pptxRegularizing DNN.pptx
Regularizing DNN.pptx
 
a deep reinforced model for abstractive summarization
a deep reinforced model for abstractive summarizationa deep reinforced model for abstractive summarization
a deep reinforced model for abstractive summarization
 
Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summary
 
Deep learning
Deep learningDeep learning
Deep learning
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
 
Day 4
Day 4Day 4
Day 4
 
Reinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIReinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAI
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
ML MODULE 5.pdf
ML MODULE 5.pdfML MODULE 5.pdf
ML MODULE 5.pdf
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptx
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitions
 
CODE TUNINGtertertertrtryryryryrtytrytrtry
CODE TUNINGtertertertrtryryryryrtytrytrtryCODE TUNINGtertertertrtryryryryrtytrytrtry
CODE TUNINGtertertertrtryryryryrtytrytrtry
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 

Mais de Sri Ambati

Mais de Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Último

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Último (20)

Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 

H2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel

  • 1. Top  10  13  Deep  Learning  Tips  &  Tricks Arno  Candel,  PhD   Chief  Architect,  H2O.ai   @ArnoCandel
  • 2. Tip  #1:  Understand  Model  Complexity Age Income FICO  score Fraud Legit Layer  1   hidden   neurons Layer  2   hidden     neurons • Neurons  are  connected  layer-­‐by-­‐layer,  feed-­‐forward,  directed  acyclic  graph   • Each  connection  is  a  floating-­‐point  number  (weight)   • Neuron  activation/output  is  non-­‐linear  transform  of  weighted  sum  of  inputs   • Model  complexity  (training  time)  grows  with  total  number  of  weights   • Training  is  iterative:  can  stop  at  any  point  and  continue  until  convergence   • Multi-­‐threading  leads  to  race  conditions,  results  are  not  reproducible [4,3]  hidden  neurons CPU  intense
  • 3. Tip  #1:  Understand  Model  Complexity Given  data:  10  billion  rows,  1111  columns   1110  predictor  columns  (1100  numerical,  10  categorical   with  400  levels  each),  1  binary  response  column Question:  Which  model  is  more  complex  (i.e.,  is  bigger,  has  more  weights)?   Model  1:        [400,400,400,400]  hidden  neurons   Model  2:        [500]  hidden  neurons
  • 4. Tip  #1:  Understand  Model  Complexity Model  1:  [5100,400,400,400,400,2]  neurons  and   5100x400  +  400x400  +  400x400  +  400x400  +  400x2  =  2.52M  weights Model  2:  [5100,500,2]  neurons  and  5100x500  +  500x2  =  2.55M  weights Answer:  Model  2.  Why?   Categoricals  are  one-­‐hot  expanded  into  4000  “dummy”  variables  (0/1),   and  hence  there  are  a  total  of  1100  +  4000  =  5100  input  neurons. Model  size  depends  on  features Given  data:  10  billion  rows,  1111  columns   1110  predictor  columns  (1100  numerical,  10  categorical   with  400  levels  each),  1  binary  response  column
  • 5. Question:  How  much  memory  does  H2O  Deep  Learning  need  to  train   such  a  model  (2.5M  weights)  on  10  billion  rows  for  100  epochs? Answer:  Model  memory  usage  is  small  and  constant  over  time:
 2.5M  x  4Bytes  (float)  x  3  (adaptive  learning  rate)  =  30MB   (more  training  -­‐>  hopefully  better  weights,  but  not  more  weights) Tip  #1:  Understand  Model  Complexity Model  size  independent  of  #rows  or  training  time Given  data:  10  billion  rows,  1111  columns   1110  predictor  columns  (1100  numerical,  10  categorical   with  400  levels  each),  1  binary  response  column
  • 6. Tip  #2:  Establish  a  Baseline  on  Holdout  Data For  a  given  dataset  (with  proper  holdout  sets),  do  the  following  first:   •Train  default  GLM,  DRF,  GBM  models  and  inspect  convergence  plots,   train/test  metrics  and  variable  importances   •Train  default  DL  models,  but  set  hidden  neurons  (and  maybe  epochs):   -­‐ hidden=[200,200]                            “I  Feel  Lucky”   -­‐ hidden=[512]                                            “Eagle  Eye”   -­‐ hidden=[64,64,64]                          “Puppy  Brain”   -­‐ hidden=[32,32,32,32,32]    “Junior  Chess  Master”   • Develop  a  feel  for  the  problem  and  the  holdout  performance  of  the   different  models Develop  a  Feel
  • 7. Tip  #3:  Inspect  Models  in  Flow • Inspect  the  model  in  Flow  during  training:  getModel  ‘model_id’   • If  the  model  is  wrong  (wrong  architecture,  response,  parameters,  etc.),  cancel  it   • If  the  model  is  taking  too  long,  cancel  and  decrease  model  complexity   • If  the  model  is  performing  badly,  cancel  and  increase  model  complexity Cancel  early,  cancel  often
  • 8. Tip  #4:  Use  Early  Stopping!  (On  by  default) Before:  trains  too  long,  but  at  least   overwrite_with_best_model=true   prevents  overfitting  (returns  the   model  with  lowest  validation  error) Now:  specify  additional  convergence  criterion:  E.g.   stopping_rounds=5,  stopping_metric=“MSE”,   stopping_tolerance=1e-­‐3,  to  stop  as  soon  as  the  moving   average  (length  5)  of  the  validation  MSE  does  not   improve  by  at  least  0.1%  for  5  consecutive  scoring  events validation  error training  error overwrite_with_best_model=true training  time  /  epochs training  time  /  epochsUse  Flow  to  inspect  the  model Early  stopping  saves  tons  of  time Best  Model Higgs  dataset
  • 9. Tip  #5:  Control  Scoring  Overhead By  default,  scoring  is  limited  to  10%  of  the  runtime,  but  if  you   provide  a  large  validation  dataset,  then  even  scoring  for  the  first  time   can  take  a  long  time.  Here  are  options  to  control  this  overhead:   •Provide  a  smaller  validation  dataset  for  scoring  during  training   •Sample  the  validation  dataset  with  ‘score_validation_samples=N’   • If  multi-­‐class  or  imbalanced,  set  ‘score_validation_sampling=“Stratified”’   •Score  less  often  with  ‘score_duty_cycle=0.05’  etc.   The  quality  of  the  validation  dataset  and  the  frequency  of  scoring  can   affect  early  stopping.  Especially  if  N-­‐fold  cross-­‐validation  is  disabled,   try  to  provide  a  validation  dataset  for  accurate  early  stopping. Validation  data  determines  scoring  speed  and  early  stopping  decision
  • 10. Tip  #6:  Use  N-­‐fold  Cross-­‐Validation Estimate  your  model  performance  well In  H2O,  N-­‐fold  Cross-­‐Validation  trains  N+1  models:   • N  models  with  each  a  different  1/N-­‐th  of  the  training  data  held  out   as  validation  data   • 1  model  on  the  full  training  data   Early  stopping  and  N-­‐fold  CV  are  especially  useful  together,  as  the   optimal  number  of  epochs  is  estimated  from  the  N  Cross-­‐Validation   models  (even  if  a  validation  dataset  is  provided).   •Early  stopping  automatically  tunes  the  number  of  epochs   •Model  performance  is  estimated  with  training  holdout  data   •The  model  is  built  on  all  of  the  training  data
  • 11. Tip  #7:  Use  Regularization Overfitting  is  easy,  generalization  is  art validation  error training  error training  time training  time Rectifier,  hidden=[50,50] RectifierWithDropout,  hidden=[50,50],   l1=1e-­‐4,  l2=1e-­‐4,  hidden_dropout_ratio=[0.2,0.3] Higgs  dataset
  • 12. Tip  #8:  Perform  HyperParameter  Search Just  need  to  find  one  of  the  many  good  models There  are  a  lot  of  hyper  parameters  for  H2O  Deep  Learning.  Many  are   for  expert-­‐level  fine  control  and  do  not  significantly  affect  the  model   performance.  Rectifier  /  RectifierWithDropout  is  recommended  for   most  problems  (The  only  difference  is  the  default  values  for   hidden_dropout_ratios:  0  /  0.5).   Main  parameters  to  tune  (I  prefer  Random  over  Grid  search)  are:   •hidden  (try  2  to  5  layers  deep,  10  to  2000  neurons  per  layer)   •hidden_dropout_ratios  and  input_dropout_ratio   •l1/l2   •adaptive_rate   •true:    rho,  epsilon   •false:  rate,  rate_annealing,  rate_decay,  momentum_start,   momentum_stable,  momentum_ramp
  • 13. Tip  #9:  Use  Checkpointing Any  H2O  Deep  Learning  model  can  be  stored  to  disk  and  then  later  restarted,   with  either  the  same  or  new  training  data.   •It’s  ok  to  cancel  a  H2O  Deep  Learning  model  at  any  time  during  training,  it   can  be  restarted   •Store  your  favorite  models  to  disk  for  later  use   •Continue  training  from  previous  model  checkpoints  (architecture,  data   types,  etc.  must  match,  many  parameters  are  allowed  to  change)     •Start  multiple  new  models  from  the  same  checkpoint,  with  different   regularization,  learning  rate  parameters  to  speed  up  hyper-­‐parameter  search Checkpointing  enables  fast  exploration
  • 14. Tip  #10:  Tune  Communication  on  Multi-­‐Node computation K-V K-V HTTPD HTTPD nodes/JVMs communication w w w w w w w w1 w3 w2 w4 w2+w4 w1+w3 w* = (w1+w2+w3+w4)/4 initial model: weights and biases w updated model: w* H2O K-V store: distributed 
 in-memory non- blocking hash map Query & display the model via JSON, WWW 2 2 4 31 1 1 1 4 3 2 1 2 1 i Map:  training   each  node  trains   independently,   parallelizes  well Reduce:  model  averaging   can  improve  accuracy,  but   takes  time  (limited  to   ~5%  of  time  by  default) Know  your  ‘target_ratio_comm_to_comp’  value Multi-­‐Node
  • 15. Tip  #11:  Know  your  Data  (Sparsity/Categoricals) If  the  number  of  input  neurons  is  >>  1000,  then  training  can  be  inefficient  and  slow,   especially  if  there  are  many  sparse  features  (e.g.,  categorical  predictors)  that  rarely  activate   neurons.  It  might  be  more  efficient  to  train  faster  with  fewer  (dense,  rich)  predictors.   Try  the  following  remedies:   •Build  a  tiny  DL  model  and  use  h2o.deepfeatures()  to  extract  lower-­‐dimensional  features   •Ignore  categorical  columns  with  high  factor  counts   •Use  random  projection  of  all  categorical  features  into  N-­‐dimensional  space,  by  setting   ‘max_categorical_features=N’   •Use  GLRM  to  reduce  the  dimensionality  of  the  dataset  (and  make  sure  to  apply  the  same   GLRM  model  to  the  test  set)   •If  the  dataset  is  just  simply  wide  (few  or  no  categorical  features),  then  the  chance  of  it   being  sparse  is  high.  If  so,  set  ‘sparse=true’,  which  can  result  in  faster  training. Know  your  data
  • 16. Tip  #12:  Advanced  Math For  regression  problems,  it  might  make  sense  to  apply  different  loss   functions  and  distributions:   • Gaussian  distribution,  squared  error  loss,  sensitive  to  outliers   • Laplace  distribution,  absolute  error  loss,  more  robust  to  outliers   • Huber  loss,  combination  of  squared  error  &  absolute  error   • Poisson  distribution  (e.g.,  number  of  claims  in  a  time  period)   • Gamma  distribution  (e.g,  size  of  insurance  claims)   • Tweedie  distribution  (compound  Poisson-­‐Gamma)   Also,  H2O  Deep  Learning  supports:   • Offsets  (per-­‐row)   • Observation  weights  (per-­‐row) Know  your  math
  • 17. Tip  #13:  Do  Ensembling To  obtain  highest  accuracy,  it’s  often  more  effective  to  average  a  few   fast  and  diverse  models  than  to  build  one  large  model.   Ideally,  use  different  network  architectures,  regularization  schemes  to   keep  the  variance  high  before  ensembling.   Given  a  set  of  (diverse,  and  ideally,  good  or  great)  models,  use   • Blending  (just  average  the  models)   • Gut-­‐feel  Blending  (assign  your  own  weights  that  add  up  to  1)   • SuperLearner  (optimized  stacking  with  meta-­‐learner,  see  Erin’s  talk)   • Add  other  models  into  the  mix  (GBM,  DRF,  GLM,  etc.) Ensembles  rule  the  leaderboards
  • 18. Summary:  Checklist  for  Success  with  H2O  Deep  Learning Know  your  Math Use  Early  Stopping Use  Checkpointing Use  Regularization Use  N-­‐fold  Cross-­‐Validation Perform  HyperParameter  Search Know  your  Data  (Sparsity/Categoricals) Tune  Communication  on  Multi-­‐Node Establish  Baseline  on  Holdout  DataUnderstand  Model  Complexity Control  Scoring  Overhead Do  Ensembling Inspect  Models  in  Flow