Boost Fertility New Invention Ups Success Rates.pdf
Setting up Machine Learning Projects - Full Stack Deep Learning
1. Full Stack Deep Learning
Josh Tobin, Sergey Karayev, Pieter Abbeel
Setting up Machine Learning Projects
2. Full Stack Deep Learning
Machine Learning Projects
2ML Projects - overview
3. Full Stack Deep Learning
Machine Learning Projects
3ML Projects - overview
85% of AI projects fail1
1 Pactera Technologies
4. Full Stack Deep Learning
Why do so many projects fail?
• ML is still research - you shouldn’t aim for 100% success rate
• But, many are doomed to fail:
• Technically infeasible or poorly scoped
• Never make the leap to production
• Unclear success criteria
• Poor team management
4ML Projects - overview
5. Full Stack Deep Learning
Module overview
5
Lifecycle
• How to think about all of the activities in an
ML project
Prioritizing
projects
• Assessing the feasibility and impact of your
projects
Archetypes
• The main categories of ML projects, and the
implications for project management
Metrics
• How to pick a single number to optimize
Baselines
• How to know if your model is performing
well
ML Projects - overview
6. Full Stack Deep Learning
Running case study - pose estimation
Position (L2 loss)
6
(x, y, z)<latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit><latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit><latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit><latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit>
( , ✓, )<latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit><latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit><latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit><latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit>
Orientation (L2 loss)
Xiang, Yu, et al. "PoseCNN: A Convolutional Neural
Network for 6D Object Pose Estimation in Cluttered
Scenes." arXiv preprint arXiv:1711.00199 (2017).
Position (L2 loss)
ML Projects - overview
7. Full Stack Deep Learning
Full Stack Robotics works on grasping
7
(x, y, z)<latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit><latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit><latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit><latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit>
( , ✓, )<latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit><latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit><latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit><latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit>
Grasp
model Motor commands
ML Projects - overview
8. Full Stack Deep Learning
Metrics
Module overview
8
Prioritizing
projects
Lifecycle
Archetypes
Baselines
• How to think about all of the activities in an
ML project
• Assessing the feasibility and impact of your
projects
• The main categories of ML projects, and the
implications for project management
• How to pick a single number to optimize
• How to know if your model is performing
well
ML Projects - lifecycle
9. Full Stack Deep Learning
Lifecycle of a ML project
9
Planning &
project setup
ML Projects - lifecycle
10. Full Stack Deep Learning
Lifecycle of a ML project
10
• Decide to work on pose estimation
• Determine requirements & goals
• Allocate resources
• Etc.
Planning &
project setup
ML Projects - lifecycle
11. Full Stack Deep Learning
Lifecycle of a ML project
11
Planning &
project setup
Data collection &
labeling
ML Projects - lifecycle
12. Full Stack Deep Learning
Lifecycle of a ML project
12
• Collect training objects
• Set up sensors (e.g., cameras)
• Capture images of objects
• Annotate with ground truth (how?)
Planning &
project setup
Data collection &
labeling
ML Projects - lifecycle
13. Full Stack Deep Learning
Lifecycle of a ML project
13
Planning &
project setup
Data collection &
labeling
ML Projects - lifecycle
14. Full Stack Deep Learning
Lifecycle of a ML project
14
• Too hard to get data.
• Easier to label a different task (e.g.,
hard to annotate pose, easier to
annotate per-pixel segmentation)
Planning &
project setup
Data collection &
labeling
ML Projects - lifecycle
15. Full Stack Deep Learning
Lifecycle of a ML project
15
Planning &
project setup
Data collection &
labeling
Training &
debugging
ML Projects - lifecycle
16. Full Stack Deep Learning
Lifecycle of a ML project
16
• Implement baseline in OpenCV
• Find SoTA model & reproduce
• Debug our implementation
• Improve model for our task
Planning &
project setup
Data collection &
labeling
Training &
debugging
ML Projects - lifecycle
17. Full Stack Deep Learning
Lifecycle of a ML project
17
Planning &
project setup
Data collection &
labeling
Training &
debugging
ML Projects - lifecycle
18. Full Stack Deep Learning
Lifecycle of a ML project
18
• Collect more data
• Realize data labeling is unreliable
Planning &
project setup
Data collection &
labeling
Training &
debugging
ML Projects - lifecycle
19. Full Stack Deep Learning
Lifecycle of a ML project
19
Planning &
project setup
Data collection &
labeling
Training &
debugging
ML Projects - lifecycle
20. Full Stack Deep Learning
Lifecycle of a ML project
20
• Realize task is too hard
• Requirements trade off with each
other - revisit which are most
important
Planning &
project setup
Data collection &
labeling
Training &
debugging
ML Projects - lifecycle
21. Full Stack Deep Learning
Lifecycle of a ML project
21
• Pilot in grasping system in the lab
• Write tests to prevent regressions
• Roll out in production
Planning &
project setup
Data collection &
labeling
Training &
debugging
Deploying &
testing
ML Projects - lifecycle
22. Full Stack Deep Learning
Lifecycle of a ML project
22
• Doesn’t work in the lab - keep
improving accuracy of model
Planning &
project setup
Data collection &
labeling
Training &
debugging
Deploying &
testing
ML Projects - lifecycle
23. Full Stack Deep Learning
Lifecycle of a ML project
23
• Fix data mismatch between
training data and data seen
in deployment
• Collect more data
• Mine hard cases
Planning &
project setup
Data collection &
labeling
Training &
debugging
Deploying &
testing
ML Projects - lifecycle
24. Full Stack Deep Learning
Lifecycle of a ML project
24
• The metric you picked doesn’t
actually drive downstream user
behavior. Revisit the metric.
• Performance in the real world isn’t
great - revisit requirements (e.g., do
we need to be faster or more
accurate?
Planning &
project setup
Data collection &
labeling
Training &
debugging
Deploying &
testing
ML Projects - lifecycle
25. Full Stack Deep Learning
Lifecycle of a ML project
25
Planning &
project setup
Data collection &
labeling
Training &
debugging
Deploying &
testing
Per-project
activities
ML Projects - lifecycle
26. Full Stack Deep Learning
Lifecycle of a ML project
26
Planning &
project setup
Data collection &
labeling
Training &
debugging
Deploying &
testing
Per-project
activities
ML Projects - lifecycle
27. Full Stack Deep Learning
Lifecycle of a ML project
27
Planning &
project setup
Data collection &
labeling
Training &
debugging
Deploying &
testing
Per-project
activities
Team & hiring
Infra & tooling
Cross-project
infrastructure
ML Projects - lifecycle
28. Full Stack Deep Learning
What else do you need to know?
• Understand state of the art in your domain
• Understand what’s possible
• Know what to try next
• We will introduce most promising research areas
28ML Projects - lifecycle
29. Full Stack Deep Learning
Lifecycle of a ML project
29
Planning &
project setup
Data collection &
labeling
Training &
debugging
Deploying &
testing
Per-project
activities
Team & hiring
Infra & tooling
Cross-project
infrastructure
ML Projects - lifecycle
30. Full Stack Deep Learning
Questions?
30ML Projects - lifecycle
31. Full Stack Deep Learning
Metrics
Module overview
31
Prioritizing
projects
Lifecycle
Archetypes
Baselines
• How to think about all of the activities in an
ML project
• Assessing the feasibility and impact of your
projects
• The main categories of ML projects, and the
implications for project management
• How to pick a single number to optimize
• How to know if your model is performing
well
ML Projects - prioritizing
32. Full Stack Deep Learning
Key points for prioritizing projects
A. High-impact ML problems
• Complex parts of your pipeline
• Places where cheap prediction is valuable
B. Cost of ML projects is driven by data availability, but
accuracy requirement also plays a big role
32ML Projects - prioritizing
33. Full Stack Deep Learning
A (general) framework for prioritizing projects
33
Feasibility (e.g., cost)
Low
High
Low High
High priority
Impact
ML Projects - prioritizing
34. Full Stack Deep Learning
Mental models for high-impact ML projects
1. Where can you take advantage of cheap prediction?
2. Where can you automate complicated manual processes?
34ML Projects - prioritizing
35. Full Stack Deep Learning
Mental models for high-impact ML projects
35
Prediction Machines: The Simple Economics of Artificial Intelligence (Agrawal, Gans, Goldfarb)
• AI reduces cost of prediction
• Prediction is central for decision making
• Cheap prediction means
• Prediction will be everywhere
• Even in problems where it was too expensive
before (e.g., for most people, hiring a driver)
• Implication: Look for projects where cheap
prediction will have a huge business impact
The economics of AI
(Agrawal, Gans, Goldfarb)
ML Projects - prioritizing
36. Full Stack Deep Learning
Mental models for high-impact ML projects
36
Software 2.0 (Andrej Karpathy): https://medium.com/@karpathy/software-2-0-a64152b37c35
Software 2.0
(Andrej Karpathy)
ML Projects - prioritizing
37. Full Stack Deep Learning
Mental models for high-impact ML projects
37
Software 2.0 (Andrej Karpathy): https://medium.com/@karpathy/software-2-0-a64152b37c35
• Software 1.0 = traditional programs with explicit instructions
(python / c++ / etc)
• Software 2.0 = humans specify goals, and algorithm
searches for a program that works
• 2.0 programmers work with datasets, which get compiled via
optimization
• Why? Works better, more general, computational advantages
• Implication: look for complicated rule-based software where
we can learn the rules instead of programming them
Software 2.0
(Andrej Karpathy)
ML Projects - prioritizing
38. Full Stack Deep Learning
A (general) framework for prioritizing projects
38
Feasibility (e.g., cost)
Low
High
Low High
High priority
Impact
ML Projects - prioritizing
39. Full Stack Deep Learning
Assessing feasibility of ML projects
39
Data availability
Accuracy requirement
Problem
difficulty
Cost drivers Main considerations
• How hard is it to acquire data?
• How expensive is data labeling?
• How much data will be needed?
• How costly are wrong predictions?
• How frequently does the system need to be
right to be useful?
• Good published work on similar problems?
(newer problems mean more risk & more
technical effort)
• Compute needed for training?
• Compute available for deployment?
ML Projects - prioritizing
40. Full Stack Deep Learning
Why are accuracy requirements so important?
40
Required accuracy
Project
cost
50% 90% 99% 99.9% …
ML Projects - prioritizing
41. Full Stack Deep Learning
Why are accuracy requirements so important?
41
Required accuracy
Project
cost
50% 90% 99% 99.9% …
ML project costs tend
to scale super-linearly
in the accuracy
requirement
ML Projects - prioritizing
42. Full Stack Deep Learning
Assessing feasibility of ML projects
42
Data availability
Accuracy requirement
Problem
difficulty
Cost drivers Main considerations
• How hard is it to acquire data?
• How expensive is data labeling?
• How much data will be needed?
• How costly are wrong predictions?
• How frequently does the system need to be
right to be useful?
• Good published work on similar problems?
(newer problems mean more risk & more
technical effort)
• Compute needed for training?
• Compute available for deployment?
ML Projects - prioritizing
43. Full Stack Deep Learning
What’s still hard in machine learning?
43ML Projects - prioritizing
44. Full Stack Deep Learning
What’s still hard in machine learning?
44ML Projects - prioritizing
45. Full Stack Deep Learning
What’s still hard in machine learning?
45
• Recognize content of
images
• Understand speech
• Translate speech
• Grasp objects
• etc.
Examples Counter-examples?
• Understand humor /
sarcasm
• In-hand robotic
manipulation
• Generalize to new
scenarios
• etc.
ML Projects - prioritizing
46. Full Stack Deep Learning
What’s still hard in machine learning?
• Unsupervised learning
• Reinforcement learning
• Both are showing promise in limited domains where tons of data and
compute are available
46ML Projects - prioritizing
47. Full Stack Deep Learning
What’s still hard in supervised learning?
• Answering questions
• Summarizing text
• Predicting video
• Building 3D models
• Real-world speech recognition
• Resisting adversarial examples
• Doing math
• Solving word puzzles
• Bongard problems
• Etc
47ML Projects - prioritizing
48. Full Stack Deep Learning
What types of problems are hard?
48
Output is complex
Reliability is
required
Generalization is
required
Instances Examples
• High-dimensional output
• Ambiguous output
• 3D reconstruction
• Video prediction
• Dialog systems
• High precision is required
• Robustness is required
• Failing safely out-of-distribution
• Robustness to adversarial attacks
• High-precision pose estimation
• Out of distribution data
• Reasoning, planning, causality
• Self-driving: edge cases
• Self-driving: control
• Small data
ML Projects - prioritizing
49. Full Stack Deep Learning
Why is FSR focusing on pose estimation?
49
• FSR’s goal is grasping - requires reliable
pose estimation
• Traditional robotics pipeline uses hand-
designed heuristics & online optimization
• Slow
• Brittle
• Great candidate for Software 2.0!
Impact Feasibility
• Data availability
• Easy to collect data
• Labeling data could be a challenge, but
can instrument lab with sensors
• Accuracy requirement
• Require high accuracy to grasp an object:
<0.5cm
• However, low cost of failure - picks per
hour important, not % successes
• Problem difficulty
• Similar published results exist but need to
adapt to our objects and robot
ML Projects - prioritizing
50. Full Stack Deep Learning
Key points for prioritizing projects
A. To find high-impact ML problems, look for complex parts
of your pipeline and places where cheap prediction is
valuable
B. The cost of ML projects is primarily driven by data
availability, but your accuracy requirement also plays a
big role
50ML Projects - prioritizing
51. Full Stack Deep Learning
Questions?
51ML Projects - prioritizing
52. Full Stack Deep Learning
Metrics
Module overview
52
Prioritizing
projects
Lifecycle
Archetypes
Baselines
• How to think about all of the activities in an
ML project
• Assessing the feasibility and impact of your
projects
• The main categories of ML projects, and the
implications for project management
• How to pick a single number to optimize
• How to know if your model is performing
well
ML Projects - archetypes
53. Full Stack Deep Learning
Machine learning project archetypes
53ML Projects - archetypes
Improve an
existing process
Examples
• Improve code completion in an IDE
• Build a customized recommendation system
• Build a better video game AI
54. Full Stack Deep Learning
Machine learning project archetypes
54ML Projects - archetypes
Improve an
existing process
Augment a manual
process
Examples
• Improve code completion in an IDE
• Build a customized recommendation system
• Build a better video game AI
• Turn sketches into slides
• Email auto-completion
• Help a radiologist do their job faster
55. Full Stack Deep Learning
Machine learning project archetypes
55ML Projects - archetypes
Improve an
existing process
Augment a manual
process
Automate a
manual process
Examples
• Improve code completion in an IDE
• Build a customized recommendation system
• Build a better video game AI
• Turn sketches into slides
• Email auto-completion
• Help a radiologist do their job faster
• Full self-driving
• Automated customer support
• Automated website design
56. Full Stack Deep Learning
Machine learning project archetypes
56ML Projects - archetypes
Improve an
existing process
Key questions
• Do your models truly improve performance?
• Does performance improvement generate business value?
• Do performance improvements lead to a data flywheel?
57. Full Stack Deep Learning
Machine learning project archetypes
57ML Projects - archetypes
Improve an
existing process
Augment a manual
process
Key questions
• Do your models truly improve performance?
• Does performance improvement generate business value?
• Do performance improvements lead to a data flywheel?
• How good does the system need to be to be useful?
• How can you collect enough data to make it that good?
58. Full Stack Deep Learning
Machine learning project archetypes
58ML Projects - archetypes
Improve an
existing process
Augment a manual
process
Automate a
manual process
Key questions
• Do your models truly improve performance?
• Does performance improvement generate business value?
• Do performance improvements lead to a data flywheel?
• How good does the system need to be to be useful?
• How can you collect enough data to make it that good?
• What is an acceptable failure rate for the system?
• How can you guarantee that it won’t exceed that failure rate?
• How inexpensively can you label data from the system?
59. Full Stack Deep Learning
Machine learning project archetypes
59ML Projects - archetypes
Improve an
existing process
Augment a manual
process
Automate a
manual process
Key questions
• Do your models truly improve performance?
• Does performance improvement generate business value?
• Do performance improvements lead to a data flywheel?
• How good does the system need to be to be useful?
• How can you collect enough data to make it that good?
• What is an acceptable failure rate for the system?
• How can you guarantee that it won’t exceed that failure rate?
• How inexpensively can you label data from the system?
60. Full Stack Deep Learning
Data flywheels
60ML Projects - archetypes
More users
More data
Better model
61. Full Stack Deep Learning
Data flywheels
61ML Projects - archetypes
More users
More data
Better model
Do you have a data
loop? Automatically
collect data (and ideally
labels) from your users
62. Full Stack Deep Learning
Data flywheels
62ML Projects - archetypes
More users
More data
Better model
Your job as ML
practitioner
63. Full Stack Deep Learning
Data flywheels
63ML Projects - archetypes
More users
More data
Better model
Do better
predictions make
the product better?
64. Full Stack Deep Learning
Machine learning project archetypes
64
Feasibility (e.g., cost)
Low
High
Low High
Impact
Improve an
existing process
Augment a
manual process
Automate a
manual process
ML Projects - archetypes
65. Full Stack Deep Learning
Machine learning project archetypes
65
Feasibility (e.g., cost)
Low
High
Low High
Impact
Improve an
existing process
Augment a
manual process
Automate a
manual process
ML Projects - archetypes
Implement a data
loop that allows
you to improve
on this task and
future ones
66. Full Stack Deep Learning
Machine learning project archetypes
66
Feasibility (e.g., cost)
Low
High
Low High
Impact
Improve an
existing process
Augment a
manual process
Automate a
manual process
ML Projects - archetypes
Good product
design. Release a
‘good enough’
version
67. Full Stack Deep Learning
Product design can reduce need for accuracy
67
See “Designing Collaborative AI” (Ben Reinhardt and Belmer Negrillo):
https://medium.com/@Ben_Reinhardt/designing-collaborative-ai-5c1e8dbc8810
ML Projects - archetypes
68. Full Stack Deep Learning
Machine learning project archetypes
68
Feasibility (e.g., cost)
Low
High
Low High
Impact
Improve an
existing process
Augment a
manual process
Automate a
manual process
ML Projects - archetypes
Add humans in
the loop. Add
guardrails and/or
limit initial scope.
69. Full Stack Deep Learning
Questions?
69ML Projects - archetypes
70. Full Stack Deep Learning
Metrics
Module overview
70
Prioritizing
projects
Lifecycle
Archetypes
Baselines
• How to think about all of the activities in an
ML project
• Assessing the feasibility and impact of your
projects
• The main categories of ML projects, and the
implications for project management
• How to pick a single number to optimize
• How to know if your model is performing
well
ML Projects - metrics
71. Full Stack Deep Learning
Key points for choosing a metric
A. The real world is messy; you usually care about lots of
metrics
B. However, ML systems work best when optimizing a single
number
C. As a result, you need to pick a formula for combining
metrics
D. This formula can and will change!
71ML Projects - metrics
72. Full Stack Deep Learning
Review of accuracy, precision, and recall
72
n=100
Predicted:
NO
Predicted:
YES
Actual:
NO
5 5 10
Actual:
YES
45 45 90
50 50
Confusion matrix
2. Choosing metrics
ML Projects - metrics
73. Full Stack Deep Learning
Review of accuracy, precision, and recall
73
n=100
Predicted:
NO
Predicted:
YES
Actual:
NO
5 5 10
Actual:
YES
45 45 90
50 50
Confusion matrix
Accuracy
Correct
Total
50%
2. Choosing metrics
ML Projects - metrics
74. Full Stack Deep Learning
Review of accuracy, precision, and recall
74
n=100
Predicted:
NO
Predicted:
YES
Actual:
NO
5 5 10
Actual:
YES
45 45 90
50 50
Confusion matrix
Precision
true positives
true positives + false positives
90%
2. Choosing metrics
ML Projects - metrics
75. Full Stack Deep Learning
Review of accuracy, precision, and recall
75
n=100
Predicted:
NO
Predicted:
YES
Actual:
NO
5 5 10
Actual:
YES
45 45 90
50 50
Confusion matrix
Recall
true positives
Actual YES
50%
2. Choosing metrics
ML Projects - metrics
76. Full Stack Deep Learning
Why choose a single metric?
76
Model 2
Model 3
Precision Recall
0.9 0.5
0.8 0.7
0.7 0.9
Which is best?
Model 1
2. Choosing metrics
ML Projects - metrics
77. Full Stack Deep Learning
How to combine metrics
• Simple average / weighted average
• Threshold n-1 metrics, evaluate the nth
• More complex / domain-specific formula
77
2. Choosing metrics
ML Projects - metrics
78. Full Stack Deep Learning
Combining precision and recall
78
Model 2
Model 3
Precision Recall
0.9 0.5
0.8 0.7
0.7 0.9
Model 1
2. Choosing metrics
ML Projects - metrics
79. Full Stack Deep Learning
Combining precision and recall
79
Model 2
Model 3
Precision Recall
0.9 0.5
0.8 0.7
0.7 0.9
(p + r) / 2
0.7
0.75
0.8
Model 1
2. Choosing metrics
ML Projects - metrics
80. Full Stack Deep Learning
Combining precision and recall
80
Model 2
Model 3
Precision Recall
0.9 0.5
0.8 0.7
0.7 0.9
(p + r) / 2
0.7
0.75
0.8
Model 1
2. Choosing metrics
ML Projects - metrics
81. Full Stack Deep Learning
How to combine metrics
• Simple average / weighted average
• Threshold n-1 metrics, evaluate the nth
• More complex / domain-specific formula
81
2. Choosing metrics
ML Projects - metrics
82. Full Stack Deep Learning
How to combine metrics
• Simple average / weighted average
• Threshold n-1 metrics, evaluate the nth
• More complex / domain-specific formula
82
2. Choosing metrics
ML Projects - metrics
83. Full Stack Deep Learning
Thresholding metrics
Choosing which metrics
to threshold
Choosing threshold
values
83
• Domain judgment (e.g., what is an acceptable
tolerance downstream? What performance is
achievable?)
• How well does the baseline model do?
• How important is this metric right now?
2. Choosing metrics
• Domain judgment (e.g., which metrics can you
engineer around?)
• Which metrics are least sensitive to model
choice?
• Which metrics are closest to desirable values?
ML Projects - metrics
84. Full Stack Deep Learning
Combining precision and recall
84
Model 2
Model 3
Precision Recall
0.9 0.5
0.8 0.7
0.7 0.9
(p + r) / 2
0.7
0.75
0.8
Model 1
2. Choosing metrics
ML Projects - metrics
85. Full Stack Deep Learning
Combining precision and recall
85
Model 2
Model 3
Precision Recall
0.9 0.5
0.8 0.7
0.7 0.9
(p + r) / 2
0.7
0.75
0.8
p @ (r > 0.6)
0.0
0.8
0.7
Model 1
2. Choosing metrics
ML Projects - metrics
86. Full Stack Deep Learning
Combining precision and recall
86
Model 2
Model 3
Precision Recall
0.9 0.5
0.8 0.7
0.7 0.9
(p + r) / 2
0.7
0.75
0.8
p @ (r > 0.6)
0.0
0.8
0.7
Model 1
2. Choosing metrics
ML Projects - metrics
87. Full Stack Deep Learning
How to combine metrics
• Simple average / weighted average
• Threshold n-1 metrics, evaluate the nth
• More complex / domain-specific formula
87
2. Choosing metrics
ML Projects - metrics
88. Full Stack Deep Learning
How to combine metrics
• Simple average / weighted average
• Threshold n-1 metrics, evaluate the nth
• More complex / domain-specific formula
88
2. Choosing metrics
ML Projects - metrics
89. Full Stack Deep Learning
Domain-specific metrics: mAP
89
Recall
Precision
0% 25% 50% 75%
0%
25%
75%
100%
100%
2. Choosing metrics
ML Projects - metrics
90. Full Stack Deep Learning
Domain-specific metrics: mAP
Recall
Precision
0% 25% 50% 75%
0%
25%
75%
100%
100%
90
mAP = mean AP,
e.g., over classes
2. Choosing metrics
ML Projects - metrics
91. Full Stack Deep Learning
Combining precision and recall
91
Model 2
Model 3
Precision Recall
0.9 0.5
0.8 0.7
0.7 0.9
(p + r) / 2
0.7
0.75
0.8
p @ (r > 0.6)
0.0
0.8
0.7
Model 1
2. Choosing metrics
ML Projects - metrics
92. Full Stack Deep Learning
Combining precision and recall
92
Model 2
Model 3
Precision Recall
0.9 0.5
0.8 0.7
0.7 0.9
(p + r) / 2
0.7
0.75
0.8
p @ (r > 0.6)
0.0
0.8
0.7
mAP
0.7
0.6
0.6
Model 1
2. Choosing metrics
ML Projects - metrics
93. Full Stack Deep Learning
Combining precision and recall
93
Model 2
Model 3
Precision Recall
0.9 0.5
0.8 0.7
0.7 0.9
(p + r) / 2
0.7
0.75
0.8
p @ (r > 0.6)
0.0
0.8
0.7
mAP
0.7
0.6
0.6
Model 1
2. Choosing metrics
ML Projects - metrics
94. Full Stack Deep Learning
Example: choosing a metric for pose estimation
(x, y, z)<latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit><latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit><latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit><latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit>
( , ✓, )<latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit><latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit><latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit><latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit>
94
Orientation (L2 loss)
t<latexit sha1_base64="Wn3278THpwSxwzLAYeJjstf3jaA=">AAAB6HicbVBNS8NAEJ34WetX1aOXYBE8lUQEPRa8eJIW7Ae0oWy2k3btZhN2J0Ip/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemEphyPO+nbX1jc2t7cJOcXdv/+CwdHTcNEmmOTZ4IhPdDplBKRQ2SJDEdqqRxaHEVji6nfmtJ9RGJOqBxikGMRsoEQnOyEp16pXKXsWbw10lfk7KkKPWK311+wnPYlTEJTOm43spBROmSXCJ02I3M5gyPmID7FiqWIwmmMwPnbrnVum7UaJtKXLn6u+JCYuNGceh7YwZDc2yNxP/8zoZRTfBRKg0I1R8sSjKpEuJO/va7QuNnOTYEsa1sLe6fMg042SzKdoQ/OWXV0nzsuJ7Fb9+Va7e53EU4BTO4AJ8uIYq3EENGsAB4Rle4c15dF6cd+dj0brm5DMn8AfO5w/jxY0E</latexit><latexit sha1_base64="Wn3278THpwSxwzLAYeJjstf3jaA=">AAAB6HicbVBNS8NAEJ34WetX1aOXYBE8lUQEPRa8eJIW7Ae0oWy2k3btZhN2J0Ip/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemEphyPO+nbX1jc2t7cJOcXdv/+CwdHTcNEmmOTZ4IhPdDplBKRQ2SJDEdqqRxaHEVji6nfmtJ9RGJOqBxikGMRsoEQnOyEp16pXKXsWbw10lfk7KkKPWK311+wnPYlTEJTOm43spBROmSXCJ02I3M5gyPmID7FiqWIwmmMwPnbrnVum7UaJtKXLn6u+JCYuNGceh7YwZDc2yNxP/8zoZRTfBRKg0I1R8sSjKpEuJO/va7QuNnOTYEsa1sLe6fMg042SzKdoQ/OWXV0nzsuJ7Fb9+Va7e53EU4BTO4AJ8uIYq3EENGsAB4Rle4c15dF6cd+dj0brm5DMn8AfO5w/jxY0E</latexit><latexit sha1_base64="Wn3278THpwSxwzLAYeJjstf3jaA=">AAAB6HicbVBNS8NAEJ34WetX1aOXYBE8lUQEPRa8eJIW7Ae0oWy2k3btZhN2J0Ip/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemEphyPO+nbX1jc2t7cJOcXdv/+CwdHTcNEmmOTZ4IhPdDplBKRQ2SJDEdqqRxaHEVji6nfmtJ9RGJOqBxikGMRsoEQnOyEp16pXKXsWbw10lfk7KkKPWK311+wnPYlTEJTOm43spBROmSXCJ02I3M5gyPmID7FiqWIwmmMwPnbrnVum7UaJtKXLn6u+JCYuNGceh7YwZDc2yNxP/8zoZRTfBRKg0I1R8sSjKpEuJO/va7QuNnOTYEsa1sLe6fMg042SzKdoQ/OWXV0nzsuJ7Fb9+Va7e53EU4BTO4AJ8uIYq3EENGsAB4Rle4c15dF6cd+dj0brm5DMn8AfO5w/jxY0E</latexit><latexit sha1_base64="Wn3278THpwSxwzLAYeJjstf3jaA=">AAAB6HicbVBNS8NAEJ34WetX1aOXYBE8lUQEPRa8eJIW7Ae0oWy2k3btZhN2J0Ip/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemEphyPO+nbX1jc2t7cJOcXdv/+CwdHTcNEmmOTZ4IhPdDplBKRQ2SJDEdqqRxaHEVji6nfmtJ9RGJOqBxikGMRsoEQnOyEp16pXKXsWbw10lfk7KkKPWK311+wnPYlTEJTOm43spBROmSXCJ02I3M5gyPmID7FiqWIwmmMwPnbrnVum7UaJtKXLn6u+JCYuNGceh7YwZDc2yNxP/8zoZRTfBRKg0I1R8sSjKpEuJO/va7QuNnOTYEsa1sLe6fMg042SzKdoQ/OWXV0nzsuJ7Fb9+Va7e53EU4BTO4AJ8uIYq3EENGsAB4Rle4c15dF6cd+dj0brm5DMn8AfO5w/jxY0E</latexit>
Prediction timeXiang, Yu, et al. "PoseCNN: A Convolutional Neural
Network for 6D Object Pose Estimation in Cluttered
Scenes." arXiv preprint arXiv:1711.00199 (2017).
Position (L2 loss)
2. Choosing metrics
ML Projects - metrics
95. Full Stack Deep Learning
Example: choosing a metric for pose estimation
• Enumerate requirements
• Downstream goal is real-time robotic grasping
• Position error must be <1cm, not sure exactly how
precise is needed
• Angular error <5 degrees
• Must run in 100ms to work in real-time
95
2. Choosing metrics
ML Projects - metrics
96. Full Stack Deep Learning
Example: choosing a metric for pose estimation
• Enumerate requirements
• Evaluate current performance
• Train a few models
96
2. Choosing metrics
ML Projects - metrics
97. Full Stack Deep Learning
Example: choosing a metric for pose estimation
• Enumerate requirements
• Evaluate current performance
• Compare current performance to requirements
• Position error between 0.75 and 1.25cm (depending on
hyperparameters)
• All angular errors around 60 degrees
• Inference time ~300ms
97
2. Choosing metrics
ML Projects - metrics
98. Full Stack Deep Learning
Example: choosing a metric for pose estimation
• Enumerate requirements
• Evaluate current performance
• Compare current performance to requirements
• Prioritize angular error
• Threshold position error at 1cm
• Ignore run time for now
98
2. Choosing metrics
ML Projects - metrics
99. Full Stack Deep Learning
Example: choosing a metric for pose estimation
• Enumerate requirements
• Evaluate current performance
• Compare current performance to requirements
• Revisit metric as your numbers improve
99
2. Choosing metrics
ML Projects - metrics
100. Full Stack Deep Learning
Key points for choosing a metric
A. The real world is messy; you usually care about lots of
metrics
B. However, ML systems work best when optimizing a single
number
C. As a result, you need to pick a formula for combining
metrics
D. This formula can and will change!
100
2. Choosing metrics
ML Projects - metrics
101. Full Stack Deep Learning
Questions?
101ML Projects - metrics
102. Full Stack Deep Learning
Metrics
Module overview
102
Prioritizing
projects
Lifecycle
Archetypes
Baselines
• How to think about all of the activities in an
ML project
• Assessing the feasibility and impact of your
projects
• The main categories of ML projects, and the
implications for project management
• How to pick a single number to optimize
• How to know if your model is performing
well
ML Projects - baselines
103. Full Stack Deep Learning
Key points for choosing baselines
A. Baselines give you a lower bound on expected model
performance
B. The tighter the lower bound, the more useful the baseline
(e.g., published results, carefully tuned pipelines, & human
baselines are better)
103ML Projects - baselines
104. Full Stack Deep Learning
Why are baselines important?
104ML Projects - baselines
105. Full Stack Deep Learning
Why are baselines important?
105
Same model, different baseline —> different next steps
ML Projects - baselines
106. Full Stack Deep Learning
Where to look for baselines
106
External
baselines
• Business / engineering requirements
ML Projects - baselines
107. Full Stack Deep Learning
Where to look for baselines
107
External
baselines Make sure comparison
is fair!
• Business / engineering requirements
• Published results
ML Projects - baselines
108. Full Stack Deep Learning
Where to look for baselines
108
External
baselines
• Business / engineering requirements
• Published results
• Scripted baselines (e.g., OpenCV, rules-based)
Internal
baselines
ML Projects - baselines
109. Full Stack Deep Learning
Where to look for baselines
109
External
baselines
• Business / engineering requirements
• Published results
• Scripted baselines (e.g., OpenCV, rules-based)
• Simple ML baselines (e.g., bag of words, linear
regression)
Internal
baselines
ML Projects - baselines
110. Full Stack Deep Learning
Where to look for baselines
110
External
baselines
Internal
baselines
• Business / engineering requirements
• Published results
• Scripted baselines (e.g., OpenCV, rules-based)
• Simple ML baselines (e.g., bag of words, linear
regression)
• Human performance
ML Projects - baselines
111. Full Stack Deep Learning
How to create good human baselines
111
Low
Quality of
baseline
High
Ease of data
collection
Low
Random people (e.g., Amazon Turk)
Ensemble of random people
Domain experts (e.g., doctors)
Deep domain experts (e.g., specialists)
Mixture of experts
ML Projects - baselines
112. Full Stack Deep Learning
How to create good human baselines
• Highest quality that allows more data to be labeled easily
• More specialized domains need more skilled labelers
• Find cases where model performs worse and concentrate data collection
there
112
More on labeling in data lecture!
ML Projects - baselines
113. Full Stack Deep Learning
Key points for choosing baselines
A. Baselines give you a lower bound on expected model
performance
B. The tighter the lower bound, the more useful the baseline
(e.g., published results, carefully tuned pipelines, human
baselines are better)
113ML Projects - baselines
114. Full Stack Deep Learning
Questions?
114ML Projects - baselines
115. Full Stack Deep Learning
Metrics
Conclusion
115
Prioritizing
projects
Lifecycle
Archetypes
Baselines
• ML projects are iterative. Deploy something
fast to begin the cycle.
• Choose projects that that are high impact
with low cost of wrong predictions
• The secret sauce to making projects work
well is to build automated data flywheels
• In the real world you care about many
things, but you should always have just one
you’re working on
• Good baselines help you invest your effort
the right way
ML Projects - baselines
116. Full Stack Deep Learning
Where to go to learn more
• Andrew Ng’s “Machine Learning Yearning”
• Andrej Karpathy’s “Software 2.0”
• Agrawal’s “The Economics of AI”
116ML Projects - conclusion