O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Troubleshooting Deep Neural Networks - Full Stack Deep Learning

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 146 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Troubleshooting Deep Neural Networks - Full Stack Deep Learning (20)

Anúncio

Mais de Sergey Karayev (10)

Mais recentes (20)

Anúncio

Troubleshooting Deep Neural Networks - Full Stack Deep Learning

  1. 1. Full Stack Deep Learning Josh Tobin, Sergey Karayev, Pieter Abbeel Troubleshooting Deep Neural Networks
  2. 2. Full Stack Deep Learning Lifecycle of a ML project 2 Planning & 
 project setup Data collection & labeling Training & debugging Deploying & 
 testing Team & hiring Per-project activities Infra & tooling Cross-project infrastructure Troubleshooting - overview
  3. 3. Full Stack Deep Learning Why talk about DL troubleshooting? 3 XKCD, https://xkcd.com/1838/ Troubleshooting - overview
  4. 4. Full Stack Deep Learning Why talk about DL troubleshooting? 4Troubleshooting - overview
  5. 5. Full Stack Deep Learning Why talk about DL troubleshooting? 5Troubleshooting - overview Common sentiment among practitioners: 80-90% of time debugging and tuning 10-20% deriving math or implementing things
  6. 6. Full Stack Deep Learning Why is DL troubleshooting so hard? 6Troubleshooting - overview
  7. 7. Full Stack Deep Learning Suppose you can’t reproduce a result 7 He, Kaiming, et al. "Deep residual learning for image recognition." 
 Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. Your learning curve Troubleshooting - overview
  8. 8. Full Stack Deep Learning Why is your performance worse? 8 Poor model performance Troubleshooting - overview
  9. 9. Full Stack Deep Learning Why is your performance worse? 9 Poor model performance Implementation bugs Troubleshooting - overview
  10. 10. Full Stack Deep Learning Most DL bugs are invisible 10Troubleshooting - overview
  11. 11. Full Stack Deep Learning Most DL bugs are invisible 11Troubleshooting - overview Labels out of order!
  12. 12. Full Stack Deep Learning Why is your performance worse? 12 Poor model performance Implementation bugs Troubleshooting - overview
  13. 13. Full Stack Deep Learning Why is your performance worse? 13 Poor model performance Implementation bugs Hyperparameter choices Troubleshooting - overview
  14. 14. Full Stack Deep Learning Models are sensitive to hyperparameters 14 Andrej Karpathy, CS231n course notes Troubleshooting - overview
  15. 15. Full Stack Deep Learning Andrej Karpathy, CS231n course notes He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE international conference on computer vision. 2015. Models are sensitive to hyperparameters 15Troubleshooting - overview
  16. 16. Full Stack Deep Learning Why is your performance worse? 16 Poor model performance Implementation bugs Hyperparameter choices Troubleshooting - overview
  17. 17. Full Stack Deep Learning Why is your performance worse? 17 Data/model fit Poor model performance Implementation bugs Hyperparameter choices Troubleshooting - overview
  18. 18. Full Stack Deep Learning Data / model fit 18 Yours: self-driving car images Troubleshooting - overview Data from the paper: ImageNet
  19. 19. Full Stack Deep Learning Why is your performance worse? 19 Data/model fit Poor model performance Implementation bugs Hyperparameter choices Troubleshooting - overview
  20. 20. Full Stack Deep Learning Why is your performance worse? 20 Dataset construction Data/model fit Poor model performance Implementation bugs Hyperparameter choices Troubleshooting - overview
  21. 21. Full Stack Deep Learning Constructing good datasets is hard 21 Amount of lost sleep over... PhD Tesla Slide from Andrej Karpathy’s talk “Building the Software 2.0 Stack” at TrainAI 2018, 5/10/2018 Troubleshooting - overview
  22. 22. Full Stack Deep Learning Common dataset construction issues • Not enough data • Class imbalances • Noisy labels • Train / test from different distributions • etc 22Troubleshooting - overview
  23. 23. Full Stack Deep Learning Takeaways: why is troubleshooting hard? • Hard to tell if you have a bug • Lots of possible sources for the same degradation in performance • Results can be sensitive to small changes in hyperparameters and dataset makeup 23Troubleshooting - overview
  24. 24. Full Stack Deep Learning Strategy for DL troubleshooting 24Troubleshooting - overview
  25. 25. Full Stack Deep Learning Key mindset for DL troubleshooting 25 Pessimism Troubleshooting - overview
  26. 26. Full Stack Deep Learning Key idea of DL troubleshooting 26 …Start simple and gradually ramp up complexity Since it’s hard to disambiguate errors… Troubleshooting - overview
  27. 27. Full Stack Deep Learning Strategy for DL troubleshooting 27 Tune hyper- parameters Implement & debug Start simple Evaluate Improve model/data Meets re- quirements Troubleshooting - overview
  28. 28. Full Stack Deep Learning Quick summary 28 Start 
 simple • Choose the simplest model & data possible (e.g., LeNet on a subset of your data) Troubleshooting - overview
  29. 29. Full Stack Deep Learning Quick summary 29 Implement & debug Start 
 simple • Choose the simplest model & data possible (e.g., LeNet on a subset of your data) • Once model runs, overfit a single batch & reproduce a known result Troubleshooting - overview
  30. 30. Full Stack Deep Learning Quick summary 30 Implement & debug Start 
 simple Evaluate • Choose the simplest model & data possible (e.g., LeNet on a subset of your data) • Once model runs, overfit a single batch & reproduce a known result • Apply the bias-variance decomposition to decide what to do next Troubleshooting - overview
  31. 31. Full Stack Deep Learning Quick summary 31 Tune hyp- eparams Implement & debug Start 
 simple Evaluate • Choose the simplest model & data possible (e.g., LeNet on a subset of your data) • Once model runs, overfit a single batch & reproduce a known result • Apply the bias-variance decomposition to decide what to do next • Use coarse-to-fine random searches Troubleshooting - overview
  32. 32. Full Stack Deep Learning Tune hyp- eparams Quick summary 32 Implement & debug Start 
 simple Evaluate Improve model/data • Choose the simplest model & data possible (e.g., LeNet on a subset of your data) • Once model runs, overfit a single batch & reproduce a known result • Apply the bias-variance decomposition to decide what to do next • Use coarse-to-fine random searches • Make your model bigger if you underfit; add data or regularize if you overfit Troubleshooting - overview
  33. 33. Full Stack Deep Learning We’ll assume you already have… • Initial test set • A single metric to improve • Target performance based on human-level performance, published results, previous baselines, etc 33Troubleshooting - overview
  34. 34. Full Stack Deep Learning We’ll assume you already have… 34 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - overview • Initial test set • A single metric to improve • Target performance based on human-level performance, published results, previous baselines, etc
  35. 35. Full Stack Deep Learning Questions? 35Troubleshooting - overview
  36. 36. Full Stack Deep Learning Tune hyper- parameters Implement & debug Start simple Evaluate Improve model/data Meets re- quirements Strategy for DL troubleshooting 36Troubleshooting - start simple
  37. 37. Full Stack Deep Learning Starting simple 37 Normalize inputs Choose a simple architecture Simplify the problem Use sensible defaults Steps b a c d Troubleshooting - start simple
  38. 38. Full Stack Deep Learning Demystifying architecture selection 38 Start here Consider using this later Images LeNet-like architecture LSTM with one hidden layer (or temporal convs) Fully connected neural net with one hidden layer ResNet Attention model or WaveNet-like model Problem-dependent Images Sequences Other Troubleshooting - start simple
  39. 39. Full Stack Deep Learning Dealing with multiple input modalities 39 “This” “is” “a” “cat” Input 2 Input 3 Input 1 Troubleshooting - start simple
  40. 40. Full Stack Deep Learning Dealing with multiple input modalities 40 “This” “is” “a” “cat” Input 2 Input 3 Input 1 Troubleshooting - start simple 1. Map each into a lower dimensional feature space
  41. 41. Full Stack Deep Learning Dealing with multiple input modalities 41 ConvNet Flatten “This” “is” “a” “cat” LSTM Input 2 Input 3 Input 1 (64-dim) (72-dim) (48-dim) Troubleshooting - start simple 1. Map each into a lower dimensional feature space
  42. 42. Full Stack Deep Learning Dealing with multiple input modalities 42 ConvNet Flatten Con“cat” “This” “is” “a” “cat” LSTM Input 2 Input 3 Input 1 (64-dim) (72-dim) (48-dim) (184-dim) Troubleshooting - start simple 2. Concatenate
  43. 43. Full Stack Deep Learning Dealing with multiple input modalities 43 ConvNet Flatten Concat “This” “is” “a” “cat” LSTM FC FC Output T/F Input 2 Input 3 Input 1 (64-dim) (72-dim) (48-dim) (184-dim) (256-dim) (128-dim) Troubleshooting - start simple 3. Pass through fully connected layers to output
  44. 44. Full Stack Deep Learning Starting simple 44 Normalize inputs Choose a simple architecture Simplify the problem Use sensible defaults Steps b a c d Troubleshooting - start simple
  45. 45. Full Stack Deep Learning Recommended network / optimizer defaults • Optimizer: Adam optimizer with learning rate 3e-4 • Activations: relu (FC and Conv models), tanh (LSTMs) • Initialization: He et al. normal (relu), Glorot normal (tanh) • Regularization: None • Data normalization: None 45Troubleshooting - start simple
  46. 46. Full Stack Deep Learning Definitions of recommended initializers • (n is the number of inputs, m is the number of outputs) • He et al. normal (used for ReLU)
 
 
 
 • Glorot normal (used for tanh) 46 N 0, r 2 n ! <latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">AAACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKpp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwwCVQQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnqq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit><latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">AAACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKpp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwwCVQQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnqq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit><latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">AAACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKpp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwwCVQQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnqq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit><latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">AAACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKpp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwwCVQQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnqq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit> N 0, r 2 n + m ! <latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">AAACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjOO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE665jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUUA5N0dsiPBYYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit><latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">AAACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjOO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE665jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUUA5N0dsiPBYYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit><latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">AAACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjOO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE665jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUUA5N0dsiPBYYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit><latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">AAACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjOO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE665jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUUA5N0dsiPBYYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit> Troubleshooting - start simple
  47. 47. Full Stack Deep Learning Normalize inputs Starting simple 47 Choose a simple architecture Simplify the problem Use sensible defaultsb a c Steps d Troubleshooting - start simple
  48. 48. Full Stack Deep Learning Important to normalize scale of input data • Subtract mean and divide by variance • For images, fine to scale values to [0, 1] or [-0.5, 0.5]
 (e.g., by dividing by 255)
 [Careful, make sure your library doesn’t do it for you!] 48Troubleshooting - start simple
  49. 49. Full Stack Deep Learning Normalize inputs Starting simple 49 Choose a simple architecture Simplify the problem Use sensible defaultsb a c Steps d Troubleshooting - start simple
  50. 50. Full Stack Deep Learning Consider simplifying the problem as well • Start with a small training set (~10,000 examples) • Use a fixed number of objects, classes, image size, etc. • Create a simpler synthetic training set 50Troubleshooting - start simple
  51. 51. Full Stack Deep Learning Simplest model for pedestrian detection • Start with a subset of 10,000 images for training, 1,000 for val, and 500 for test • Use a LeNet architecture with sigmoid cross-entropy loss • Adam optimizer with LR 3e-4 • No regularization 51 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - start simple
  52. 52. Full Stack Deep Learning Normalize inputs Starting simple 52 Choose a simple architecture Simplify the problem Use sensible defaultsb a c Steps d Summary • Start with a simpler version of your problem (e.g., smaller dataset) • Adam optimizer & no regularization • Subtract mean and divide by std, or just divide by 255 (ims) • LeNet, LSTM, or fully connected Troubleshooting - start simple
  53. 53. Full Stack Deep Learning Questions? 53Troubleshooting - start simple
  54. 54. Full Stack Deep Learning Tune hyper- parameters Implement & debug Start simple Evaluate Improve model/data Meets re- quirements Strategy for DL troubleshooting 54Troubleshooting - debug
  55. 55. Full Stack Deep Learning Implementing bug-free DL models 55 Get your model to run Compare to a known result Overfit a single batch b a c Steps Troubleshooting - debug
  56. 56. Full Stack Deep Learning Preview: the five most common DL bugs • Incorrect shapes for your tensors
 Can fail silently! E.g., accidental broadcasting: x.shape = (None,), y.shape = (None, 1), (x+y).shape = (None, None) • Pre-processing inputs incorrectly
 E.g., Forgetting to normalize, or too much pre-processing • Incorrect input to your loss function
 E.g., softmaxed outputs to a loss that expects logits • Forgot to set up train mode for the net correctly
 E.g., toggling train/eval, controlling batch norm dependencies • Numerical instability - inf/NaN
 Often stems from using an exp, log, or div operation 56Troubleshooting - debug
  57. 57. Full Stack Deep Learning General advice for implementing your model 57 Use off-the-shelf components, e.g., • Keras • tf.layers.dense(…) 
 instead of 
 tf.nn.relu(tf.matmul(W, x)) • tf.losses.cross_entropy(…) 
 instead of writing out the exp Lightweight implementation • Minimum possible new lines of code for v1 • Rule of thumb: <200 lines • (Tested infrastructure components are fine) Build complicated data pipelines later • Start with a dataset you can load into memory Troubleshooting - debug
  58. 58. Full Stack Deep Learning Implementing bug-free DL models 58 Get your model to run Compare to a known result Overfit a single batch b a c Steps Troubleshooting - debug
  59. 59. Full Stack Deep Learning Get your model to run a Shape mismatch Casting issue OOM Other Common issues Recommended resolution Scale back memory intensive operations one-by-one Standard debugging toolkit (Stack Overflow + interactive debugger) Implementing bug-free DL models 59 Step through model creation and inference in a debugger Troubleshooting - debug
  60. 60. Full Stack Deep Learning Get your model to run a Shape mismatch Casting issue OOM Other Common issues Recommended resolution Scale back memory intensive operations one-by-one Standard debugging toolkit (Stack Overflow + interactive debugger) Implementing bug-free DL models 60 Step through model creation and inference in a debugger Troubleshooting - debug
  61. 61. Full Stack Deep Learning Debuggers for DL code 61 • Pytorch: easy, use ipdb • tensorflow: trickier 
 
 Option 1: step through graph creation Troubleshooting - debug
  62. 62. Full Stack Deep Learning Debuggers for DL code 62 • Pytorch: easy, use ipdb • tensorflow: trickier 
 
 Option 2: step into training loop Evaluate tensors using sess.run(…) Troubleshooting - debug
  63. 63. Full Stack Deep Learning Debuggers for DL code 63 • Pytorch: easy, use ipdb • tensorflow: trickier 
 
 Option 3: use tfdb Stops execution at each sess.run(…) and lets you inspect python -m tensorflow.python.debug.examples.debug_mnist --debug Troubleshooting - debug
  64. 64. Full Stack Deep Learning Get your model to run a Shape mismatch Casting issue OOM Other Common issues Recommended resolution Scale back memory intensive operations one-by-one Standard debugging toolkit (Stack Overflow + interactive debugger) Implementing bug-free DL models 64 Step through model creation and inference in a debugger Troubleshooting - debug
  65. 65. Full Stack Deep Learning Shape mismatch Undefined shapes Incorrect shapes Common issues Most common causes • Flipped dimensions when using tf.reshape(…) • Took sum, average, or softmax over wrong dimension • Forgot to flatten after conv layers • Forgot to get rid of extra “1” dimensions (e.g., if shape is (None, 1, 1, 4) • Data stored on disk in a different dtype than loaded (e.g., stored a float64 numpy array, and loaded it as a float32) Implementing bug-free DL models 65 • Confusing tensor.shape, tf.shape(tensor), tensor.get_shape() • Reshaping things to a shape of type Tensor (e.g., when loading data from a file) Troubleshooting - debug
  66. 66. Full Stack Deep Learning Casting issue Data not in float32 Common issues Most common causes Implementing bug-free DL models 66 • Forgot to cast images from uint8 to float32 • Generated data using numpy in float64, forgot to cast to float32 Troubleshooting - debug
  67. 67. Full Stack Deep Learning OOM Too big a tensor Too much data Common issues Most common causes Duplicating operations Other processes • Other processes running on your GPU • Memory leak due to creating multiple models in the same session • Repeatedly creating an operation (e.g., in a function that gets called over and over again) • Too large a batch size for your model (e.g., during evaluation) • Too large fully connected layers • Loading too large a dataset into memory, rather than using an input queue • Allocating too large a buffer for dataset creation Implementing bug-free DL models 67Troubleshooting - debug
  68. 68. Full Stack Deep Learning Other common errors Other bugs Common issues Most common causes Implementing bug-free DL models 68 • Forgot to initialize variables • Forgot to turn off bias when using batch norm • “Fetch argument has invalid type” - usually you overwrote one of your ops with an output during training Troubleshooting - debug
  69. 69. Full Stack Deep Learning Get your model to run Compare to a known result Overfit a single batch b a c Steps Implementing bug-free DL models 69Troubleshooting - debug
  70. 70. Full Stack Deep Learning Overfit a single batch b Error goes up Error oscillates Common issues Error explodes Error plateaus Most common causes Implementing bug-free DL models 70Troubleshooting - debug
  71. 71. Full Stack Deep Learning Overfit a single batch b Error goes up Error oscillates Common issues Error explodes Error plateaus Most common causes Implementing bug-free DL models 71Troubleshooting - debug • Flipped the sign of the loss function / gradient • Learning rate too high • Softmax taken over wrong dimension
  72. 72. Full Stack Deep Learning Overfit a single batch b Error goes up Error oscillates Common issues Error explodes Error plateaus Most common causes • Numerical issue. Check all exp, log, and div operations • Learning rate too high Implementing bug-free DL models 72Troubleshooting - debug
  73. 73. Full Stack Deep Learning Overfit a single batch b Error goes up Error oscillates Common issues Error explodes Error plateaus Most common causes • Data or labels corrupted (e.g., zeroed, incorrectly shuffled, or preprocessed incorrectly) • Learning rate too high Implementing bug-free DL models 73Troubleshooting - debug
  74. 74. Full Stack Deep Learning Overfit a single batch b Error goes up Error oscillates Common issues Error explodes Error plateaus Most common causes • Learning rate too low • Gradients not flowing through the whole model • Too much regularization • Incorrect input to loss function (e.g., softmax instead of logits, accidentally add ReLU on output) • Data or labels corrupted Implementing bug-free DL models 74Troubleshooting - debug
  75. 75. Full Stack Deep Learning Overfit a single batch b Error goes up Error oscillates Common issues Error explodes Error plateaus Most common causes • Numerical issue. Check all exp, log, and div operations • Learning rate too high • Data or labels corrupted (e.g., zeroed or incorrectly shuffled) • Learning rate too high • Learning rate too low • Gradients not flowing through the whole model • Too much regularization • Incorrect input to loss function (e.g., softmax instead of logits) • Data or labels corrupted Implementing bug-free DL models 75Troubleshooting - debug • Flipped the sign of the loss function / gradient • Learning rate too high • Softmax taken over wrong dimension
  76. 76. Full Stack Deep Learning Get your model to run Compare to a known result Overfit a single batch b a c Steps Implementing bug-free DL models 76Troubleshooting - debug
  77. 77. Full Stack Deep Learning Hierarchy of known results 77 More useful Less useful You can: 
 • Walk through code line-by-line and ensure you have the same output • Ensure your performance is up to par with expectations Troubleshooting - debug • Official model implementation evaluated on similar dataset to yours
  78. 78. Full Stack Deep Learning Hierarchy of known results 78 More useful Less useful You can: 
 • Walk through code line-by-line and ensure you have the same output Troubleshooting - debug • Official model implementation evaluated on similar dataset to yours • Official model implementation evaluated on benchmark (e.g., MNIST)
  79. 79. Full Stack Deep Learning • Official model implementation evaluated on similar dataset to yours • Official model implementation evaluated on benchmark (e.g., MNIST) • Unofficial model implementation • Results from the paper (with no code) • Results from your model on a benchmark dataset (e.g., MNIST) • Results from a similar model on a similar dataset • Super simple baselines (e.g., average of outputs or linear regression) Hierarchy of known results 79 More useful Less useful You can: 
 • Same as before, but with lower confidence Troubleshooting - debug
  80. 80. Full Stack Deep Learning Hierarchy of known results 80 More useful Less useful You can: 
 • Ensure your performance is up to par with expectations Troubleshooting - debug • Official model implementation evaluated on similar dataset to yours • Official model implementation evaluated on benchmark (e.g., MNIST) • Unofficial model implementation • Results from a paper (with no code)
  81. 81. Full Stack Deep Learning • Official model implementation evaluated on similar dataset to yours • Official model implementation evaluated on benchmark (e.g., MNIST) • Unofficial model implementation • Results from the paper (with no code) • Results from your model on a benchmark dataset (e.g., MNIST) Hierarchy of known results 81 More useful Less useful You can: 
 • Make sure your model performs well in a simpler setting Troubleshooting - debug
  82. 82. Full Stack Deep Learning • Official model implementation evaluated on similar dataset to yours • Official model implementation evaluated on benchmark (e.g., MNIST) • Unofficial model implementation • Results from the paper (with no code) • Results from your model on a benchmark dataset (e.g., MNIST) • Results from a similar model on a similar dataset • Super simple baselines (e.g., average of outputs or linear regression) Hierarchy of known results 82 More useful Less useful You can: 
 • Get a general sense of what kind of performance can be expected Troubleshooting - debug
  83. 83. Full Stack Deep Learning • Official model implementation evaluated on similar dataset to yours • Official model implementation evaluated on benchmark (e.g., MNIST) • Unofficial model implementation • Results from the paper (with no code) • Results from your model on a benchmark dataset (e.g., MNIST) • Results from a similar model on a similar dataset • Super simple baselines (e.g., average of outputs or linear regression) Hierarchy of known results 83 More useful Less useful You can: 
 • Make sure your model is learning anything at all Troubleshooting - debug
  84. 84. Full Stack Deep Learning Hierarchy of known results 84 More useful Less useful Troubleshooting - debug • Official model implementation evaluated on similar dataset to yours • Official model implementation evaluated on benchmark (e.g., MNIST) • Unofficial model implementation • Results from the paper (with no code) • Results from your model on a benchmark dataset (e.g., MNIST) • Results from a similar model on a similar dataset • Super simple baselines (e.g., average of outputs or linear regression)
  85. 85. Full Stack Deep Learning Summary: how to implement & debug 85 Get your model to run Compare to a known result Overfit a single batch b a c Steps Summary • Look for corrupted data, over- regularization, broadcasting errors • Keep iterating until model performs up to expectations Troubleshooting - debug • Step through in debugger & watch out for shape, casting, and OOM errors
  86. 86. Full Stack Deep Learning Questions? 86Troubleshooting - debug
  87. 87. Full Stack Deep Learning Strategy for DL troubleshooting 87 Tune hyper- parameters Implement & debug Start simple Evaluate Improve model/data Meets re- quirements Troubleshooting - evaluate
  88. 88. Full Stack Deep Learning Bias-variance decomposition 88Troubleshooting - evaluate
  89. 89. Full Stack Deep Learning Bias-variance decomposition 89Troubleshooting - evaluate
  90. 90. Full Stack Deep Learning Bias-variance decomposition 90Troubleshooting - evaluate
  91. 91. Full Stack Deep Learning 25 2 27 5 32 2 34 Irreducible error Avoidable bias (i.e., underfitting) Train error Variance (i.e., overfitting) Val error Val set overfitting Test error 0 5 10 15 20 25 30 35 40 Breakdown of test error by source Bias-variance decomposition 91Troubleshooting - evaluate
  92. 92. Full Stack Deep Learning Bias-variance decomposition • Test error = irreducible error + bias + variance + val overfitting • This assumes train, val, and test all come from the same distribution. What if not? 92Troubleshooting - evaluate
  93. 93. Full Stack Deep Learning Handling distribution shift 93 Test data Use two val sets: one sampled from training distribution and one from test distribution Train data Troubleshooting - evaluate
  94. 94. Full Stack Deep Learning The bias-variance tradeoff 94Troubleshooting - evaluate
  95. 95. Full Stack Deep Learning Bias-variance with distribution shift 95Troubleshooting - evaluate
  96. 96. Full Stack Deep Learning Bias-variance with distribution shift 96 25 2 27 2 29 3 32 2 34 Irreducibleerror Avoidablebias (i.e.,underfitting) Trainerror Variance Trainvalerror Distributionshift Testvalerror Valoverfitting Testerror 0 5 10 15 20 25 30 35 40 Breakdown of test error by source Troubleshooting - evaluate
  97. 97. Full Stack Deep Learning Train, val, and test error for pedestrian detection 97 Error source Value Goal performance 1% Train error 20% Validation error 27% Test error 28% Train - goal = 19%
 (under-fitting) 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - evaluate
  98. 98. Full Stack Deep Learning Train, val, and test error for pedestrian detection 98 Error source Value Goal performance 1% Train error 20% Validation error 27% Test error 28% Val - train = 7%
 (over-fitting) 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - evaluate
  99. 99. Full Stack Deep Learning Train, val, and test error for pedestrian detection 99 Error source Value Goal performance 1% Train error 20% Validation error 27% Test error 28% Test - val = 1%
 (looks good!) 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - evaluate
  100. 100. Full Stack Deep Learning Summary: evaluating model performance 100Troubleshooting - evaluate Test error = irreducible error + bias + variance 
 + distribution shift + val overfitting
  101. 101. Full Stack Deep Learning Questions? 101Troubleshooting - evaluate
  102. 102. Full Stack Deep Learning Strategy for DL troubleshooting 102 Tune hyper- parameters Implement & debug Start simple Evaluate Improve model/data Meets re- quirements Troubleshooting - improve
  103. 103. Full Stack Deep Learning Address distribution shift Prioritizing improvements (i.e., applied b-v) 103 Address under-fitting Re-balance datasets 
 (if applicable) Address over-fittingb a c Steps d Troubleshooting - improve
  104. 104. Full Stack Deep Learning Addressing under-fitting (i.e., reducing bias) 104 Try first Try later Troubleshooting - improve A. Make your model bigger (i.e., add layers or use more units per layer) B. Reduce regularization C. Error analysis D. Choose a different (closer to state-of-the art) model architecture (e.g., move from LeNet to ResNet) E. Tune hyper-parameters (e.g., learning rate) F. Add features
  105. 105. Full Stack Deep Learning Train, val, and test error for pedestrian detection 105 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy 
 (i.e., 1% error) Error source Value Value Goal performance 1% 1% Train error 20% 7% Validation error 27% 19% Test error 28% 20% Add more layers to the ConvNet Troubleshooting - improve
  106. 106. Full Stack Deep Learning Train, val, and test error for pedestrian detection 106 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy 
 (i.e., 1% error) Error source Value Value Value Goal performance 1% 1% 1% Train error 20% 7% 3% Validation error 27% 19% 10% Test error 28% 20% 10% Switch to ResNet-101 Troubleshooting - improve
  107. 107. Full Stack Deep Learning Train, val, and test error for pedestrian detection 107 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy 
 (i.e., 1% error) Error source Value Value Value Value Goal performance 1% 1% 1% 1% Train error 20% 7% 3% 0.8% Validation error 27% 19% 10% 12% Test error 28% 20% 10% 12% Tune learning rate Troubleshooting - improve
  108. 108. Full Stack Deep Learning Address distribution shift Prioritizing improvements (i.e., applied b-v) 108 Address under-fitting Re-balance datasets 
 (if applicable) Address over-fittingb a c Steps d Troubleshooting - improve
  109. 109. Full Stack Deep Learning Addressing over-fitting (i.e., reducing variance) 109 Try first Try later Troubleshooting - improve A. Add more training data (if possible!) B. Add normalization (e.g., batch norm, layer norm) C. Add data augmentation D. Increase regularization (e.g., dropout, L2, weight decay) E. Error analysis F. Choose a different (closer to state-of-the-art) model architecture G. Tune hyperparameters H. Early stopping I. Remove features J. Reduce model size
  110. 110. Full Stack Deep Learning Addressing over-fitting (i.e., reducing variance) 110 Try first Try later Not recommended! Troubleshooting - improve A. Add more training data (if possible!) B. Add normalization (e.g., batch norm, layer norm) C. Add data augmentation D. Increase regularization (e.g., dropout, L2, weight decay) E. Error analysis F. Choose a different (closer to state-of-the-art) model architecture G. Tune hyperparameters H. Early stopping I. Remove features J. Reduce model size
  111. 111. Full Stack Deep Learning Train, val, and test error for pedestrian detection 111 Error source Value Goal performance 1% Train error 0.8% Validation error 12% Test error 12% 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - improve
  112. 112. Full Stack Deep Learning Train, val, and test error for pedestrian detection 112 Error source Value Value Goal performance 1% 1% Train error 0.8% 1.5% Validation error 12% 5% Test error 12% 6% Increase dataset size to 250,000 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - improve
  113. 113. Full Stack Deep Learning Train, val, and test error for pedestrian detection 113 Error source Value Value Value Goal performance 1% 1% 1% Train error 0.8% 1.5% 1.7% Validation error 12% 5% 4% Test error 12% 6% 4% Add weight decay 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - improve
  114. 114. Full Stack Deep Learning Train, val, and test error for pedestrian detection 114 Error source Value Value Value Value Goal performance 1% 1% 1% 1% Train error 0.8% 1.5% 1.7% 2% Validation error 12% 5% 4% 2.5% Test error 12% 6% 4% 2.6% Add data augmentation 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - improve
  115. 115. Full Stack Deep Learning Train, val, and test error for pedestrian detection 115 Error source Value Value Value Value Value Goal performance 1% 1% 1% 1% 1% Train error 0.8% 1.5% 1.7% 2% 0.6% Validation error 12% 5% 4% 2.5% 0.9% Test error 12% 6% 4% 2.6% 1.0% Tune num layers, optimizer params, weight initialization, kernel size, weight decay 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - improve
  116. 116. Full Stack Deep Learning Address distribution shift Prioritizing improvements (i.e., applied b-v) 116 Address under-fitting Re-balance datasets 
 (if applicable) Address over-fittingb a c Steps d Troubleshooting - improve
  117. 117. Full Stack Deep Learning Addressing distribution shift 117 Try first Try later Troubleshooting - improve A. Analyze test-val set errors & collect more training data to compensate B. Analyze test-val set errors & synthesize more training data to compensate C. Apply domain adaptation techniques to training & test distributions
  118. 118. Full Stack Deep Learning Error analysis 118 Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected) Troubleshooting - improve
  119. 119. Full Stack Deep Learning Error analysis 119 Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected) Error type 1: hard-to-see pedestrians Troubleshooting - improve
  120. 120. Full Stack Deep Learning Error analysis 120 Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected) Error type 2: reflections Troubleshooting - improve
  121. 121. Full Stack Deep Learning Error analysis 121 Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected) Error type 3 (test-val only): night scenes Troubleshooting - improve
  122. 122. Full Stack Deep Learning Error analysis 122 Error type Error % (train-val) Error % (test-val) Potential solutions Priority 1. Hard-to-see pedestrians 0.1% 0.1% • Better sensors Low 2. Reflections 0.3% 0.3% • Collect more data with reflections • Add synthetic reflections to train set • Try to remove with pre-processing • Better sensors Medium 3. Nighttime scenes 0.1% 1% • Collect more data at night • Synthetically darken training images • Simulate night-time data • Use domain adaptation High Troubleshooting - improve
  123. 123. Full Stack Deep Learning Domain adaptation 123 What is it? Techniques to train on “source” distribution and generalize to another “target” using only unlabeled data or limited labeled data When should you consider using it? • Access to labeled data from test distribution is limited • Access to relatively similar data is plentiful Troubleshooting - improve
  124. 124. Full Stack Deep Learning Types of domain adaptation 124 Type Use case Example techniques Supervised You have limited data from target domain • Fine-tuning a pre- trained model • Adding target data to train set Un-supervised You have lots of un- labeled data from target domain • Correlation Alignment (CORAL) • Domain confusion • CycleGAN Troubleshooting - improve
  125. 125. Full Stack Deep Learning Address distribution shift Prioritizing improvements (i.e., applied b-v) 125 Address under-fitting Re-balance datasets 
 (if applicable) Address over-fittingb a c Steps d Troubleshooting - improve
  126. 126. Full Stack Deep Learning Rebalancing datasets • If (test)-val looks significantly better than test, you overfit to the val set • This happens with small val sets or lots of hyper parameter tuning • When it does, recollect val data 126Troubleshooting - improve
  127. 127. Full Stack Deep Learning Questions? 127Troubleshooting - improve
  128. 128. Full Stack Deep Learning Strategy for DL troubleshooting 128 Tune hyper- parameters Implement & debug Start simple Evaluate Improve model/data Meets re- quirements Troubleshooting - tune
  129. 129. Full Stack Deep Learning Hyperparameter optimization 129 Model & optimizer choices? Network: ResNet - How many layers? - Weight initialization? - Kernel size? - Etc Optimizer: Adam - Batch size? - Learning rate? - beta1, beta2, epsilon? Regularization - …. 0 (no pedestrian) 1 (yes pedestrian) Goal: 99% classification accuracy Running example Troubleshooting - tune
  130. 130. Full Stack Deep Learning Which hyper-parameters to tune? 130 Choosing hyper-parameters • More sensitive to some than others • Depends on choice of model • Rules of thumb (only) to the right • Sensitivity is relative to default values! 
 (e.g., if you are using all-zeros weight initialization or vanilla SGD, changing to the defaults will make a big difference) Hyperparameter Approximate sensitivity Learning rate High Learning rate schedule High Optimizer choice Low Other optimizer params
 (e.g., Adam beta1) Low Batch size Low Weight initialization Medium Loss function High Model depth Medium Layer size High Layer params 
 (e.g., kernel size) Medium Weight of regularization Medium Nonlinearity Low Troubleshooting - tune
  131. 131. Full Stack Deep Learning Method 1: manual hyperparam optimization 131 How it works • Understand the algorithm • E.g., higher learning rate means faster less stable training • Train & evaluate model • Guess a better hyperparam value & re- evaluate • Can be combined with other methods (e.g., manually select parameter ranges to optimizer over) Advantages Disadvantages • For a skilled practitioner, may require least computation to get good result • Requires detailed understanding of the algorithm • Time-consuming Troubleshooting - tune
  132. 132. Full Stack Deep Learning Method 2: grid search 132 Hyperparameter1(e.g.,batchsize) Hyperparameter 2 (e.g., learning rate) How it works Disadvantages • Super simple to implement • Can produce good results • Not very efficient: need to train on all cross-combos of hyper-parameters • May require prior knowledge about parameters to get
 good results Advantages Troubleshooting - tune
  133. 133. Full Stack Deep Learning Method 3: random search 133 Hyperparameter1(e.g.,batchsize) Hyperparameter 2 (e.g., learning rate) How it works Disadvantages Advantages • Easy to implement • Often produces better results than grid search • Not very interpretable • May require prior knowledge about parameters to get
 good results Troubleshooting - tune
  134. 134. Full Stack Deep Learning Method 4: coarse-to-fine 134 Hyperparameter1(e.g.,batchsize) Hyperparameter 2 (e.g., learning rate) How it works Disadvantages Advantages Troubleshooting - tune
  135. 135. Full Stack Deep Learning Method 4: coarse-to-fine 135 Hyperparameter1(e.g.,batchsize) Hyperparameter 2 (e.g., learning rate) How it works Disadvantages Advantages Best performers Troubleshooting - tune
  136. 136. Full Stack Deep Learning Method 4: coarse-to-fine 136 Hyperparameter1(e.g.,batchsize) Hyperparameter 2 (e.g., learning rate) How it works Disadvantages Advantages Troubleshooting - tune
  137. 137. Full Stack Deep Learning Method 4: coarse-to-fine 137 Hyperparameter1(e.g.,batchsize) Hyperparameter 2 (e.g., learning rate) How it works Disadvantages Advantages Troubleshooting - tune
  138. 138. Full Stack Deep Learning Method 4: coarse-to-fine 138 Hyperparameter1(e.g.,batchsize) Hyperparameter 2 (e.g., learning rate) How it works Disadvantages • Can narrow in on very high performing hyperparameters • Most used method in practice • Somewhat manual process Advantages etc. Troubleshooting - tune
  139. 139. Full Stack Deep Learning Method 5: Bayesian hyperparam opt 139 How it works (at a high level) • Start with a prior estimate of parameter distributions • Maintain a probabilistic model of the relationship between hyper-parameter values and model performance • Alternate between: • Training with the hyper-parameter values that maximize the expected improvement • Using training results to update our probabilistic model • To learn more, see: 
 Advantages Disadvantages • Generally the most efficient hands-off way to choose hyperparameters • Difficult to implement from scratch • Can be hard to integrate with off-the-shelf tools https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f Troubleshooting - tune
  140. 140. Full Stack Deep Learning Method 5: Bayesian hyperparam opt 140 How it works (at a high level) • Start with a prior estimate of parameter distributions • Maintain a probabilistic model of the relationship between hyper-parameter values and model performance • Alternate between: • Training with the hyper-parameter values that maximize the expected improvement • Using training results to update our probabilistic model • To learn more, see: 
 Advantages Disadvantages • Generally the most efficient hands-off way to choose hyperparameters • Difficult to implement from scratch • Can be hard to integrate with off-the-shelf tools More on tools to do this automatically in the infrastructure & tooling lecture! https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f Troubleshooting - tune
  141. 141. Full Stack Deep Learning Summary of how to optimize hyperparams • Coarse-to-fine random searches • Consider Bayesian hyper-parameter optimization solutions as your codebase matures 141Troubleshooting - tune
  142. 142. Full Stack Deep Learning Questions? 142Troubleshooting - tune
  143. 143. Full Stack Deep Learning Conclusion 143 • DL debugging is hard due to many competing sources of error • To train bug-free DL models, we treat building our model as an iterative process • The following steps can make the process easier and catch errors as early as possible Troubleshooting - conclusion
  144. 144. Full Stack Deep Learning How to build bug-free DL models 144 Tune hyp- eparams Implement & debug Start 
 simple Evaluate Improve model/data • Choose the simplest model & data possible (e.g., LeNet on a subset of your data) • Once model runs, overfit a single batch & reproduce a known result • Apply the bias-variance decomposition to decide what to do next • Use coarse-to-fine random searches • Make your model bigger if you underfit; add data or regularize if you overfit Troubleshooting - conclusion
  145. 145. Full Stack Deep Learning Where to go to learn more • Andrew Ng’s book Machine Learning Yearning (http:// www.mlyearning.org/) • The following Twitter thread:
 https://twitter.com/karpathy/status/1013244313327681536 • This blog post: 
 https://pcc.cs.byu.edu/2017/10/02/practical-advice-for- building-deep-neural-networks/ 145Troubleshooting - conclusion
  146. 146. Full Stack Deep Learning Thank you! 146

×