Streamlining Python Development: A Guide to a Modern Project Setup
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
1. Full Stack Deep Learning - UC Berkeley Spring 2021 - Sergey Karayev, Josh Tobin, Pieter Abbeel
Setting up ML Projects
2. Full Stack Deep Learning - UC Berkeley Spring 2021
Machine Learning Projects
2
ML Projects - overview
3. Full Stack Deep Learning - UC Berkeley Spring 2021
Machine Learning Projects
3
ML Projects - overview
85% of AI projects fail1
1 Pactera Technologies
4. Full Stack Deep Learning - UC Berkeley Spring 2021
Why do so many projects fail?
• ML is still research - you shouldn’t aim for 100% success rate
• But, many are doomed to fail:
• Technically infeasible or poorly scoped
• Never make the leap to production
• Unclear success criteria
• Poor team management
4
ML Projects - overview
5. Full Stack Deep Learning - UC Berkeley Spring 2021
Module overview
5
Lifecycle
• How to think about all of the activities in an
ML project
Prioritizing
projects
• Assessing the feasibility and impact of your
projects
Archetypes
• The main categories of ML projects, and the
implications for project management
Metrics
• How to pick a single number to optimize
Baselines
• How to know if your model is performing
well
ML Projects - overview
6. Full Stack Deep Learning - UC Berkeley Spring 2021
Running case study - pose estimation
Position (L2 loss)
6
(x, y, z)
<latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit>
<latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit>
<latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit>
<latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit>
( , ✓, )
<latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit>
<latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit>
<latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit>
<latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit>
Orientation (L2 loss)
Xiang, Yu, et al. "PoseCNN: A Convolutional Neural
Network for 6D Object Pose Estimation in Cluttered
Scenes." arXiv preprint arXiv:1711.00199 (2017).
Position (L2 loss)
ML Projects - overview
7. Full Stack Deep Learning - UC Berkeley Spring 2021
Full Stack Robotics works on grasping
7
(x, y, z)
<latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit>
<latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit>
<latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit>
<latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit>
( , ✓, )
<latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit>
<latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit>
<latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit>
<latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit>
Grasp
model Motor commands
ML Projects - overview
8. Full Stack Deep Learning - UC Berkeley Spring 2021
Metrics
Module overview
8
Prioritizing
projects
Lifecycle
Archetypes
Baselines
• How to think about all of the activities in an
ML project
• Assessing the feasibility and impact of your
projects
• The main categories of ML projects, and the
implications for project management
• How to pick a single number to optimize
• How to know if your model is performing
well
ML Projects - lifecycle
9. Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
9
Planning &
project setup
ML Projects - lifecycle
10. Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
10
• Decide to work on pose estimation
• Determine requirements & goals
• Allocate resources
• Consider the ethical implications
• Etc.
Planning &
project setup
ML Projects - lifecycle
11. Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
11
Planning &
project setup
Data collection &
labeling
ML Projects - lifecycle
12. Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
12
• Collect training objects
• Set up sensors (e.g., cameras)
• Capture images of objects
• Annotate with ground truth (how?)
Planning &
project setup
Data collection &
labeling
ML Projects - lifecycle
13. Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
13
Planning &
project setup
Data collection &
labeling
ML Projects - lifecycle
14. Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
14
• Too hard to get data.
• Easier to label a different task (e.g.,
hard to annotate pose, easier to
annotate per-pixel segmentation)
Planning &
project setup
Data collection &
labeling
ML Projects - lifecycle
15. Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
15
Planning &
project setup
Data collection &
labeling
Training &
debugging
ML Projects - lifecycle
16. Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
16
• Implement baseline in OpenCV
• Find SoTA model & reproduce
• Debug our implementation
• Improve model for our task
Planning &
project setup
Data collection &
labeling
Training &
debugging
ML Projects - lifecycle
17. Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
17
Planning &
project setup
Data collection &
labeling
Training &
debugging
ML Projects - lifecycle
18. Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
18
• Collect more data
• Realize data labeling is unreliable
Planning &
project setup
Data collection &
labeling
Training &
debugging
ML Projects - lifecycle
19. Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
19
Planning &
project setup
Data collection &
labeling
Training &
debugging
ML Projects - lifecycle
20. Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
20
• Realize task is too hard
• Requirements trade off with each
other - revisit which are most
important
Planning &
project setup
Data collection &
labeling
Training &
debugging
ML Projects - lifecycle
21. Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
21
• Pilot in grasping system in the lab
• Write tests to prevent regressions
and evaluate for biases
• Roll out in production
Planning &
project setup
Data collection &
labeling
Training &
debugging
Deploying &
testing
ML Projects - lifecycle
22. Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
22
• Doesn’t work in the lab - keep
improving accuracy of model
Planning &
project setup
Data collection &
labeling
Training &
debugging
Deploying &
testing
ML Projects - lifecycle
23. Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
23
• Fix data mismatch between
training data and data seen
in deployment
• Collect more data
• Mine hard cases
Planning &
project setup
Data collection &
labeling
Training &
debugging
Deploying &
testing
ML Projects - lifecycle
24. Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
24
• The metric you picked doesn’t
actually drive downstream user
behavior. Revisit the metric.
• Performance in the real world isn’t
great - revisit requirements (e.g., do
we need to be faster or more
accurate?
Planning &
project setup
Data collection &
labeling
Training &
debugging
Deploying &
testing
ML Projects - lifecycle
25. Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
25
Planning &
project setup
Data collection &
labeling
Training &
debugging
Deploying &
testing
Per-project
activities
ML Projects - lifecycle
26. Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
26
Planning &
project setup
Data collection &
labeling
Training &
debugging
Deploying &
testing
Per-project
activities
ML Projects - lifecycle
27. Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
27
Planning &
project setup
Data collection &
labeling
Training &
debugging
Deploying &
testing
Per-project
activities
Team & hiring
Infra & tooling
Cross-project
infrastructure
ML Projects - lifecycle
28. Full Stack Deep Learning - UC Berkeley Spring 2021
What else do you need to know?
• Understand state of the art in your domain
• Understand what’s possible
• Know what to try next
• We will introduce most promising research areas
28
ML Projects - lifecycle
29. Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
29
Planning &
project setup
Data collection &
labeling
Training &
debugging
Deploying &
testing
Per-project
activities
Team & hiring
Infra & tooling
Cross-project
infrastructure
ML Projects - lifecycle
30. Full Stack Deep Learning - UC Berkeley Spring 2021
Questions?
30
ML Projects - lifecycle
31. Full Stack Deep Learning - UC Berkeley Spring 2021
Metrics
Module overview
31
Prioritizing
projects
Lifecycle
Archetypes
Baselines
• How to think about all of the activities in an
ML project
• Assessing the feasibility and impact of your
projects
• The main categories of ML projects, and the
implications for project management
• How to pick a single number to optimize
• How to know if your model is performing
well
ML Projects - prioritizing
32. Full Stack Deep Learning - UC Berkeley Spring 2021
Key points for prioritizing projects
A. High-impact ML problems
• Friction in your product
• Complex parts of your pipeline
• Places where cheap prediction is valuable
• What else are people doing?
B. Cost of ML projects is driven by data availability. Also consider
accuracy requirements and intrinsic difficulty of the problem
32
ML Projects - prioritizing
33. Full Stack Deep Learning - UC Berkeley Spring 2021
A (general) framework for prioritizing projects
33
Feasibility (e.g., cost)
Low
High
Low High
High priority
Impact
ML Projects - prioritizing
34. Full Stack Deep Learning - UC Berkeley Spring 2021
Mental models for high-impact ML projects
1. Where can you take advantage of cheap prediction?
2. Where is there friction in your product?
3. Where can you automate complicated manual processes?
4. What are other people doing?
34
ML Projects - prioritizing
35. Full Stack Deep Learning - UC Berkeley Spring 2021
What does ML make economically feasible?
35
Prediction Machines: The Simple Economics of Artificial Intelligence (Agrawal, Gans, Goldfarb)
• AI reduces cost of prediction
• Prediction is central for decision making
• Cheap prediction means
• Prediction will be everywhere
• Even in problems where it was too expensive
before (e.g., for most people, hiring a driver)
• Implication: Look for projects where cheap
prediction will have a huge business impact
The economics of AI
(Agrawal, Gans, Goldfarb)
ML Projects - prioritizing
36. Full Stack Deep Learning - UC Berkeley Spring 2021
What does your product need?
36
ML Projects - prioritizing
“Discover Weekly removed the friction of chasing
everything down yourself and instead brought the music
to you in a neat little package every Monday morning.”
37. Full Stack Deep Learning - UC Berkeley Spring 2021
What is ML good at?
37
Software 2.0 (Andrej Karpathy): https://medium.com/@karpathy/software-2-0-a64152b37c35
Software 2.0
(Andrej Karpathy)
ML Projects - prioritizing
38. Full Stack Deep Learning - UC Berkeley Spring 2021
What is ML good at?
38
Software 2.0 (Andrej Karpathy): https://medium.com/@karpathy/software-2-0-a64152b37c35
• Software 1.0 = traditional programs with explicit instructions
(python / c++ / etc)
• Software 2.0 = humans specify goals, and algorithm
searches for a program that works
• 2.0 programmers work with datasets, which get compiled via
optimization
• Why? Works better, more general, computational advantages
• Implication: look for complicated rule-based software where
we can learn the rules instead of programming them
Software 2.0
(Andrej Karpathy)
ML Projects - prioritizing
39. Full Stack Deep Learning - UC Berkeley Spring 2021
What are other people doing?
39
ML Projects - prioritizing
Human-Centric Machine Learning Infrastructure @Netflix (Ville Tuulos, InfoQ 2019)
40. Full Stack Deep Learning - UC Berkeley Spring 2021
What are other people doing?
40
ML Projects - prioritizing
2020 state of enterprise machine learning (Algorithmia, 2020)
41. Full Stack Deep Learning - UC Berkeley Spring 2021
What are other people doing?
• Papers from Google, Facebook, Nvidia, Netflix, etc
• Blog posts from top earlier-stage companies (Uber, Lyft, Spotify, Stripe,
etc)
41
ML Projects - prioritizing
42. Full Stack Deep Learning - UC Berkeley Spring 2021
Case studies
• Using Machine Learning to Predict Value of Homes On Airbnb (Robert Chang, Airbnb Engineering & Data Science, 2017)
• Using Machine Learning to Improve Streaming Quality at Netflix (Chaitanya Ekanadham, Netflix Technology Blog, 2018)
• 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Bernardi et al., KDD, 2019)Asdf
• How we grew from 0 to 4 million women on our fashion app, with a vertical machine learning approach (Gabriel Aldamiz,
HackerNoon, 2018)
• Machine Learning-Powered Search Ranking of Airbnb Experiences (Mihajlo Grbovic, Airbnb Engineering & Data Science, 2019)
• From shallow to deep learning in fraud (Hao Yi Ong, Lyft Engineering, 2018)
• Space, Time and Groceries (Jeremy Stanley, Tech at Instacart, 2017)
• Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning (Brad Neuberg, Dropbox Engineering, 2017)
• Scaling Machine Learning at Uber with Michelangelo (Jeremy Hermann and Mike Del Balso, Uber Engineering, 2019)
• Spotify’s Discover Weekly: How machine learning finds your new music (Umesh .A Bhat, 2017)
42
ML Projects - prioritizing
Credit for compiling this list: Chip Huyen (Machine Learning Systems Design Lecture 2 note)
43. Full Stack Deep Learning - UC Berkeley Spring 2021
A (general) framework for prioritizing projects
43
Feasibility (e.g., cost)
Low
High
Low High
High priority
Impact
ML Projects - prioritizing
44. Full Stack Deep Learning - UC Berkeley Spring 2021
Assessing feasibility of ML projects
44
Data availability
Accuracy requirement
Problem
difficulty
Cost drivers Main considerations
• How hard is it to acquire data?
• How expensive is data labeling?
• How much data will be needed?
• How stable is the data?
• Data security requirements?
• How costly are wrong predictions?
• How frequently does the system need to be right to be
useful?
• Ethical implications?
• Is the problem well-defined?
• Good published work on similar problems?
(newer problems mean more risk & more technical effort)
• Compute requirements?
• Can a human do it?
ML Projects - prioritizing
45. Full Stack Deep Learning - UC Berkeley Spring 2021
Why are accuracy requirements so important?
45
Required accuracy
Project
cost
50% 90% 99% 99.9% …
ML Projects - prioritizing
46. Full Stack Deep Learning - UC Berkeley Spring 2021
Why are accuracy requirements so important?
46
Required accuracy
Project
cost
50% 90% 99% 99.9% …
ML project costs tend
to scale super-linearly
in the accuracy
requirement
ML Projects - prioritizing
47. Full Stack Deep Learning - UC Berkeley Spring 2021
Assessing feasibility of ML projects
47
Data availability
Accuracy requirement
Problem
difficulty
Cost drivers Main considerations
ML Projects - prioritizing
• How hard is it to acquire data?
• How expensive is data labeling?
• How much data will be needed?
• How stable is the data?
• Data security requirements?
• How costly are wrong predictions?
• How frequently does the system need to be right to be
useful?
• Ethical implications?
• Is the problem well-defined?
• Good published work on similar problems?
(newer problems mean more risk & more technical effort)
• Compute requirements?
• Can a human do it?
48. Full Stack Deep Learning - UC Berkeley Spring 2021
What’s still hard in machine learning?
48
ML Projects - prioritizing
49. Full Stack Deep Learning - UC Berkeley Spring 2021
What’s still hard in machine learning?
49
ML Projects - prioritizing
50. Full Stack Deep Learning - UC Berkeley Spring 2021
What’s still hard in machine learning?
50
• Recognize content of
images
• Understand speech
• Translate speech
• Grasp objects
• etc.
Examples Counter-examples?
• Understand humor /
sarcasm
• In-hand robotic
manipulation
• Generalize to new
scenarios
• etc.
ML Projects - prioritizing
51. Full Stack Deep Learning - UC Berkeley Spring 2021
What’s still hard in machine learning?
• Unsupervised learning
• Reinforcement learning
• Both are showing promise in limited domains where tons of data and
compute are available
51
ML Projects - prioritizing
52. Full Stack Deep Learning - UC Berkeley Spring 2021
What’s still hard in supervised learning?
• Answering questions
• Summarizing text
• Predicting video
• Building 3D models
• Real-world speech recognition
• Resisting adversarial examples
• Doing math
• Solving word puzzles
• Bongard problems
• Etc
52
ML Projects - prioritizing
53. Full Stack Deep Learning - UC Berkeley Spring 2021
What types of problems are hard?
53
Output is complex
Reliability is
required
Generalization is
required
Instances Examples
• High-dimensional output
• Ambiguous output
• 3D reconstruction
• Video prediction
• Dialog systems
• Open-ended recommender systems
• High precision is required
• Robustness is required
• Failing safely out-of-distribution
• Robustness to adversarial attacks
• High-precision pose estimation
• Out of distribution data
• Reasoning, planning, causality
• Self-driving: edge cases
• Self-driving: control
• Small data
ML Projects - prioritizing
54. Full Stack Deep Learning - UC Berkeley Spring 2021
Why is FSR focusing on pose estimation?
54
• FSR’s goal is grasping - requires reliable
pose estimation
• Traditional robotics pipeline uses hand-
designed heuristics & online optimization
• Slow
• Brittle
• Great candidate for Software 2.0!
Impact Feasibility
• Data availability
• Easy to collect data
• Labeling data could be a challenge, but
can instrument lab with sensors
• Accuracy requirement
• Require high accuracy to grasp an object:
<0.5cm
• However, low cost of failure - picks per
hour important, not % successes
• Problem difficulty
• Similar published results exist but need to
adapt to our objects and robot
ML Projects - prioritizing
55. Full Stack Deep Learning - UC Berkeley Spring 2021
How to run a ML feasibility assessment
A. Are you sure you need ML at all?
B. Put in the work up-front to define success criteria with all of the
stakeholders
C. Consider the ethics of using ML
D. Do a literature review
E. Try to rapidly build a labeled benchmark dataset
F. Build a *minimal* viable product (e.g., manual rules)
G. Are you sure you need ML at all?
55
ML Projects - prioritizing
56. Full Stack Deep Learning - UC Berkeley Spring 2021
Questions?
56
ML Projects - prioritizing
57. Full Stack Deep Learning - UC Berkeley Spring 2021
Metrics
Module overview
57
Prioritizing
projects
Lifecycle
Archetypes
Baselines
• How to think about all of the activities in an
ML project
• Assessing the feasibility and impact of your
projects
• The main categories of ML projects, and the
implications for project management
• How to pick a single number to optimize
• How to know if your model is performing
well
ML Projects - archetypes
58. Full Stack Deep Learning - UC Berkeley Spring 2021
Machine learning product archetypes
58
ML Projects - archetypes
Software 2.0
Examples
• Improve code completion in an IDE
• Build a customized recommendation system
• Build a better video game AI
59. Full Stack Deep Learning - UC Berkeley Spring 2021
Machine learning product archetypes
59
ML Projects - archetypes
Software 2.0
Human-in-the-
loop
Examples
• Improve code completion in an IDE
• Build a customized recommendation system
• Build a better video game AI
• Turn sketches into slides
• Email auto-completion
• Help a radiologist do their job faster
60. Full Stack Deep Learning - UC Berkeley Spring 2021
Machine learning product archetypes
60
ML Projects - archetypes
Software 2.0
Human-in-the-
loop
Autonomous
systems
Examples
• Improve code completion in an IDE
• Build a customized recommendation system
• Build a better video game AI
• Turn sketches into slides
• Email auto-completion
• Help a radiologist do their job faster
• Full self-driving
• Automated customer support
• Automated website design
61. Full Stack Deep Learning - UC Berkeley Spring 2021
Machine learning product archetypes
61
ML Projects - archetypes
Software 2.0
Key questions
• Do your models truly improve performance?
• Does performance improvement generate business value?
• Do performance improvements lead to a data flywheel?
62. Full Stack Deep Learning - UC Berkeley Spring 2021
Machine learning product archetypes
62
ML Projects - archetypes
Software 2.0
Human-in-the-
loop
Key questions
• Do your models truly improve performance?
• Does performance improvement generate business value?
• Do performance improvements lead to a data flywheel?
• How good does the system need to be to be useful?
• How can you collect enough data to make it that good?
63. Full Stack Deep Learning - UC Berkeley Spring 2021
Machine learning product archetypes
63
ML Projects - archetypes
Software 2.0
Human-in-the-
loop
Autonomous
systems
Key questions
• Do your models truly improve performance?
• Does performance improvement generate business value?
• Do performance improvements lead to a data flywheel?
• How good does the system need to be to be useful?
• How can you collect enough data to make it that good?
• What is an acceptable failure rate for the system?
• How can you guarantee that it won’t exceed that failure rate?
• How inexpensively can you label data from the system?
64. Full Stack Deep Learning - UC Berkeley Spring 2021
Machine learning product archetypes
64
ML Projects - archetypes
Software 2.0
Human-in-the-
loop
Autonomous
systems
Key questions
• Do your models truly improve performance?
• Does performance improvement generate business value?
• Do performance improvements lead to a data flywheel?
• How good does the system need to be to be useful?
• How can you collect enough data to make it that good?
• What is an acceptable failure rate for the system?
• How can you guarantee that it won’t exceed that failure rate?
• How inexpensively can you label data from the system?
65. Full Stack Deep Learning - UC Berkeley Spring 2021
Data flywheels
65
ML Projects - archetypes
More users
More data
Better model
66. Full Stack Deep Learning - UC Berkeley Spring 2021
Data flywheels
66
ML Projects - archetypes
More users
More data
Better model
Do you have a data
loop? Automatically
collect data (and ideally
labels) from your users
67. Full Stack Deep Learning - UC Berkeley Spring 2021
Data flywheels
67
ML Projects - archetypes
More users
More data
Better model
Your job as ML
practitioner
68. Full Stack Deep Learning - UC Berkeley Spring 2021
Data flywheels
68
ML Projects - archetypes
More users
More data
Better model
Do better
predictions make
the product better?
69. Full Stack Deep Learning - UC Berkeley Spring 2021
Machine learning project archetypes
69
Feasibility (e.g., cost)
Low
High
Low High
Impact
Software 2.0
Human-in-the-
loop
Autonomous
systems
ML Projects - archetypes
70. Full Stack Deep Learning - UC Berkeley Spring 2021
Machine learning product archetypes
70
Feasibility (e.g., cost)
Low
High
Low High
Impact
ML Projects - archetypes
Implement a data
loop that allows
you to improve
on this task and
future ones
Software 2.0
Human-in-the-
loop
Autonomous
systems
71. Full Stack Deep Learning - UC Berkeley Spring 2021
Machine learning product archetypes
71
Feasibility (e.g., cost)
Low
High
Low High
Impact
ML Projects - archetypes
Good product
design. Release a
‘good enough’
version
Software 2.0
Human-in-the-
loop
Autonomous
systems
72. Full Stack Deep Learning - UC Berkeley Spring 2021
Product design can reduce need for accuracy
72
See “Designing Collaborative AI” (Ben Reinhardt and Belmer Negrillo):
https://medium.com/@Ben_Reinhardt/designing-collaborative-ai-5c1e8dbc8810
ML Projects - archetypes
73. Full Stack Deep Learning - UC Berkeley Spring 2021
Apple’s ML product design guidelines
• What role does ML play in your app?
• Critical or complementary?
• Private or public?
• Proactive or retroactive?
• Visible or invisible?
• Dynamic or static?
73
https://developer.apple.com/design/human-interface-guidelines/machine-learning/overview/introduction/
74. Full Stack Deep Learning - UC Berkeley Spring 2021
Apple’s ML product design guidelines
• How can you learn from your users?
• Explicit feedback (e.g., “suggest less pop music”)
• Implicit feedback (e.g., “like this song”)
• Calibration during setup (e.g., scan your face for FaceID)
• Corrections (e.g., fix mistakes model made)
74
https://developer.apple.com/design/human-interface-guidelines/machine-learning/overview/introduction/
75. Full Stack Deep Learning - UC Berkeley Spring 2021
Apple’s ML product design guidelines
• How should your app handle mistakes?
• Limitations (let your users know where you expect model to perform
well)
• Corrections (let your users succeed even if the model fails_
• Attributions (help users understand where suggestions come from)
• Confidence (help users gauge quality of results)
75
https://developer.apple.com/design/human-interface-guidelines/machine-learning/overview/introduction/
76. Full Stack Deep Learning - UC Berkeley Spring 2021
Machine learning product design
76
Guidelines for Human-AI Interaction
https://www.microsoft.com/en-us/research/project/guidelines-for-human-ai-interaction/
ML Projects - archetypes
77. Full Stack Deep Learning - UC Berkeley Spring 2021
Machine learning product design
77
Guidelines for Human-AI Interaction
https://www.microsoft.com/en-us/research/project/guidelines-for-human-ai-interaction/
ML Projects - archetypes
78. Full Stack Deep Learning - UC Berkeley Spring 2021
Machine learning product design
78
Guidelines for Human-AI Interaction
https://www.microsoft.com/en-us/research/project/guidelines-for-human-ai-interaction/
ML Projects - archetypes
79. Full Stack Deep Learning - UC Berkeley Spring 2021
Machine learning product design
79
Guidelines for Human-AI Interaction
https://www.microsoft.com/en-us/research/project/guidelines-for-human-ai-interaction/
ML Projects - archetypes
80. Full Stack Deep Learning - UC Berkeley Spring 2021
Machine learning product design
80
Guidelines for Human-AI Interaction
https://www.microsoft.com/en-us/research/project/guidelines-for-human-ai-interaction/
ML Projects - archetypes
81. Full Stack Deep Learning - UC Berkeley Spring 2021
Product design can reduce need for accuracy
81
https://spotify.design/article/three-principles-for-designing-ml-powered-products
ML Projects - archetypes
82. Full Stack Deep Learning - UC Berkeley Spring 2021
Machine learning product archetypes
82
Feasibility (e.g., cost)
Low
High
Low High
Impact
ML Projects - archetypes
Add humans in
the loop. Add
guardrails and/or
limit initial scope.
Software 2.0
Human-in-the-
loop
Autonomous
systems
83. Full Stack Deep Learning - UC Berkeley Spring 2021
Questions?
83
ML Projects - archetypes
84. Full Stack Deep Learning - UC Berkeley Spring 2021
Metrics
Module overview
84
Prioritizing
projects
Lifecycle
Archetypes
Baselines
• How to think about all of the activities in an
ML project
• Assessing the feasibility and impact of your
projects
• The main categories of ML projects, and the
implications for project management
• How to pick a single number to optimize
• How to know if your model is performing
well
ML Projects - metrics
85. Full Stack Deep Learning - UC Berkeley Spring 2021
Key points for choosing a metric
A. The real world is messy; you usually care about lots of
metrics
B. However, ML systems work best when optimizing a single
number
C. As a result, you need to pick a formula for combining
metrics
D. This formula can and will change!
85
ML Projects - metrics
86. Full Stack Deep Learning - UC Berkeley Spring 2021
Review of accuracy, precision, and recall
86
n=100
Predicted:
NO
Predicted:
YES
Actual:
NO
5 5 10
Actual:
YES
45 45 90
50 50
Confusion matrix
ML Projects - metrics
87. Full Stack Deep Learning - UC Berkeley Spring 2021
Review of accuracy, precision, and recall
87
n=100
Predicted:
NO
Predicted:
YES
Actual:
NO
5 5 10
Actual:
YES
45 45 90
50 50
Confusion matrix
Accuracy
Correct
Total
50%
ML Projects - metrics
88. Full Stack Deep Learning - UC Berkeley Spring 2021
Review of accuracy, precision, and recall
88
n=100
Predicted:
NO
Predicted:
YES
Actual:
NO
5 5 10
Actual:
YES
45 45 90
50 50
Confusion matrix
Precision
true positives
true positives + false positives
90%
ML Projects - metrics
89. Full Stack Deep Learning - UC Berkeley Spring 2021
Review of accuracy, precision, and recall
89
n=100
Predicted:
NO
Predicted:
YES
Actual:
NO
5 5 10
Actual:
YES
45 45 90
50 50
Confusion matrix
Recall
true positives
Actual YES
50%
ML Projects - metrics
90. Full Stack Deep Learning - UC Berkeley Spring 2021
Why choose a single metric?
90
Model 2
Model 3
Precision Recall
0.9 0.5
0.8 0.7
0.7 0.9
Which is best?
Model 1
ML Projects - metrics
91. Full Stack Deep Learning - UC Berkeley Spring 2021
How to combine metrics
• Simple average / weighted average
• Threshold n-1 metrics, evaluate the nth
• More complex / domain-specific formula
91
ML Projects - metrics
92. Full Stack Deep Learning - UC Berkeley Spring 2021
Combining precision and recall
92
Model 2
Model 3
Precision Recall
0.9 0.5
0.8 0.7
0.7 0.9
Model 1
ML Projects - metrics
93. Full Stack Deep Learning - UC Berkeley Spring 2021
Combining precision and recall
93
Model 2
Model 3
Precision Recall
0.9 0.5
0.8 0.7
0.7 0.9
(p + r) / 2
0.7
0.75
0.8
Model 1
ML Projects - metrics
94. Full Stack Deep Learning - UC Berkeley Spring 2021
Combining precision and recall
94
Model 2
Model 3
Precision Recall
0.9 0.5
0.8 0.7
0.7 0.9
(p + r) / 2
0.7
0.75
0.8
Model 1
ML Projects - metrics
95. Full Stack Deep Learning - UC Berkeley Spring 2021
How to combine metrics
• Simple average / weighted average
• Threshold n-1 metrics, evaluate the nth
• More complex / domain-specific formula
95
ML Projects - metrics
96. Full Stack Deep Learning - UC Berkeley Spring 2021
How to combine metrics
• Simple average / weighted average
• Threshold n-1 metrics, evaluate the nth
• More complex / domain-specific formula
96
ML Projects - metrics
97. Full Stack Deep Learning - UC Berkeley Spring 2021
Thresholding metrics
Choosing which metrics
to threshold
Choosing threshold
values
97
• Domain judgment (e.g., what is an acceptable
tolerance downstream? What performance is
achievable?)
• How well does the baseline model do?
• How important is this metric right now?
• Domain judgment (e.g., which metrics can you
engineer around?)
• Which metrics are least sensitive to model
choice?
• Which metrics are closest to desirable values?
ML Projects - metrics
98. Full Stack Deep Learning - UC Berkeley Spring 2021
Combining precision and recall
98
Model 2
Model 3
Precision Recall
0.9 0.5
0.8 0.7
0.7 0.9
(p + r) / 2
0.7
0.75
0.8
Model 1
ML Projects - metrics
99. Full Stack Deep Learning - UC Berkeley Spring 2021
Combining precision and recall
99
Model 2
Model 3
Precision Recall
0.9 0.5
0.8 0.7
0.7 0.9
(p + r) / 2
0.7
0.75
0.8
p @ (r > 0.6)
0.0
0.8
0.7
Model 1
ML Projects - metrics
100. Full Stack Deep Learning - UC Berkeley Spring 2021
Combining precision and recall
100
Model 2
Model 3
Precision Recall
0.9 0.5
0.8 0.7
0.7 0.9
(p + r) / 2
0.7
0.75
0.8
p @ (r > 0.6)
0.0
0.8
0.7
Model 1
ML Projects - metrics
101. Full Stack Deep Learning - UC Berkeley Spring 2021
How to combine metrics
• Simple average / weighted average
• Threshold n-1 metrics, evaluate the nth
• More complex / domain-specific formula
101
ML Projects - metrics
102. Full Stack Deep Learning - UC Berkeley Spring 2021
How to combine metrics
• Simple average / weighted average
• Threshold n-1 metrics, evaluate the nth
• More complex / domain-specific formula
102
ML Projects - metrics
103. Full Stack Deep Learning - UC Berkeley Spring 2021
Domain-specific metrics: mAP
103
Recall
Precision
0% 25% 50% 75%
0%
25%
75%
100%
100%
ML Projects - metrics
104. Full Stack Deep Learning - UC Berkeley Spring 2021
Domain-specific metrics: mAP
Recall
Precision
0% 25% 50% 75%
0%
25%
75%
100%
100%
104
mAP = mean AP,
e.g., over classes
ML Projects - metrics
105. Full Stack Deep Learning - UC Berkeley Spring 2021
Combining precision and recall
105
Model 2
Model 3
Precision Recall
0.9 0.5
0.8 0.7
0.7 0.9
(p + r) / 2
0.7
0.75
0.8
p @ (r > 0.6)
0.0
0.8
0.7
Model 1
ML Projects - metrics
106. Full Stack Deep Learning - UC Berkeley Spring 2021
Combining precision and recall
106
Model 2
Model 3
Precision Recall
0.9 0.5
0.8 0.7
0.7 0.9
(p + r) / 2
0.7
0.75
0.8
p @ (r > 0.6)
0.0
0.8
0.7
mAP
0.7
0.6
0.6
Model 1
ML Projects - metrics
107. Full Stack Deep Learning - UC Berkeley Spring 2021
Combining precision and recall
107
Model 2
Model 3
Precision Recall
0.9 0.5
0.8 0.7
0.7 0.9
(p + r) / 2
0.7
0.75
0.8
p @ (r > 0.6)
0.0
0.8
0.7
mAP
0.7
0.6
0.6
Model 1
ML Projects - metrics
108. Full Stack Deep Learning - UC Berkeley Spring 2021
Example: choosing a metric for pose estimation
(x, y, z)
<latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit>
<latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit>
<latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit>
<latexit sha1_base64="VzuaXkC3pMHYMgeFzgbdHYPD3xo=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahQim7Iuix4MWTVLAf0i4lm2bb0CS7JFmxLv0VXjwo4tWf481/Y9ruQVsfDDzem2FmXhBzpo3rfju5ldW19Y38ZmFre2d3r7h/0NRRoghtkIhHqh1gTTmTtGGY4bQdK4pFwGkrGF1N/dYDVZpF8s6MY+oLPJAsZAQbK92XHytoXEFPp71iya26M6Bl4mWkBBnqveJXtx+RRFBpCMdadzw3Nn6KlWGE00mhm2gaYzLCA9qxVGJBtZ/ODp6gE6v0URgpW9Kgmfp7IsVC67EIbKfAZqgXvan4n9dJTHjpp0zGiaGSzBeFCUcmQtPvUZ8pSgwfW4KJYvZWRIZYYWJsRgUbgrf48jJpnlU9t+rdnpdqN1kceTiCYyiDBxdQg2uoQwMICHiGV3hzlPPivDsf89ack80cwh84nz/sK480</latexit>
( , ✓, )
<latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit>
<latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit>
<latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit>
<latexit sha1_base64="BmkDtqmIRxqts5HvJACOgh3GSEs=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0WoICURQZcFN66kgn1AE8pkOmmGTiZh5kaoofgrblwo4tb/cOffOG2z0NYDl3s4517mzglSwTU4zre1tLyyurZe2ihvbm3v7Np7+y2dZIqyJk1EojoB0UxwyZrAQbBOqhiJA8HawfB64rcfmNI8kfcwSpkfk4HkIacEjNSzD6teGvEz7EHEgJiean7asytOzZkCLxK3IBVUoNGzv7x+QrOYSaCCaN11nRT8nCjgVLBx2cs0SwkdkgHrGipJzLSfT68f4xOj9HGYKFMS8FT9vZGTWOtRHJjJmECk572J+J/XzSC88nMu0wyYpLOHwkxgSPAkCtznilEQI0MIVdzcimlEFKFgAiubENz5Ly+S1nnNdWru3UWlflvEUUJH6BhVkYsuUR3doAZqIooe0TN6RW/Wk/VivVsfs9Elq9g5QH9gff4AtECUHw==</latexit>
108
Orientation (L2 loss)
t
<latexit sha1_base64="Wn3278THpwSxwzLAYeJjstf3jaA=">AAAB6HicbVBNS8NAEJ34WetX1aOXYBE8lUQEPRa8eJIW7Ae0oWy2k3btZhN2J0Ip/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemEphyPO+nbX1jc2t7cJOcXdv/+CwdHTcNEmmOTZ4IhPdDplBKRQ2SJDEdqqRxaHEVji6nfmtJ9RGJOqBxikGMRsoEQnOyEp16pXKXsWbw10lfk7KkKPWK311+wnPYlTEJTOm43spBROmSXCJ02I3M5gyPmID7FiqWIwmmMwPnbrnVum7UaJtKXLn6u+JCYuNGceh7YwZDc2yNxP/8zoZRTfBRKg0I1R8sSjKpEuJO/va7QuNnOTYEsa1sLe6fMg042SzKdoQ/OWXV0nzsuJ7Fb9+Va7e53EU4BTO4AJ8uIYq3EENGsAB4Rle4c15dF6cd+dj0brm5DMn8AfO5w/jxY0E</latexit>
<latexit sha1_base64="Wn3278THpwSxwzLAYeJjstf3jaA=">AAAB6HicbVBNS8NAEJ34WetX1aOXYBE8lUQEPRa8eJIW7Ae0oWy2k3btZhN2J0Ip/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemEphyPO+nbX1jc2t7cJOcXdv/+CwdHTcNEmmOTZ4IhPdDplBKRQ2SJDEdqqRxaHEVji6nfmtJ9RGJOqBxikGMRsoEQnOyEp16pXKXsWbw10lfk7KkKPWK311+wnPYlTEJTOm43spBROmSXCJ02I3M5gyPmID7FiqWIwmmMwPnbrnVum7UaJtKXLn6u+JCYuNGceh7YwZDc2yNxP/8zoZRTfBRKg0I1R8sSjKpEuJO/va7QuNnOTYEsa1sLe6fMg042SzKdoQ/OWXV0nzsuJ7Fb9+Va7e53EU4BTO4AJ8uIYq3EENGsAB4Rle4c15dF6cd+dj0brm5DMn8AfO5w/jxY0E</latexit>
<latexit sha1_base64="Wn3278THpwSxwzLAYeJjstf3jaA=">AAAB6HicbVBNS8NAEJ34WetX1aOXYBE8lUQEPRa8eJIW7Ae0oWy2k3btZhN2J0Ip/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemEphyPO+nbX1jc2t7cJOcXdv/+CwdHTcNEmmOTZ4IhPdDplBKRQ2SJDEdqqRxaHEVji6nfmtJ9RGJOqBxikGMRsoEQnOyEp16pXKXsWbw10lfk7KkKPWK311+wnPYlTEJTOm43spBROmSXCJ02I3M5gyPmID7FiqWIwmmMwPnbrnVum7UaJtKXLn6u+JCYuNGceh7YwZDc2yNxP/8zoZRTfBRKg0I1R8sSjKpEuJO/va7QuNnOTYEsa1sLe6fMg042SzKdoQ/OWXV0nzsuJ7Fb9+Va7e53EU4BTO4AJ8uIYq3EENGsAB4Rle4c15dF6cd+dj0brm5DMn8AfO5w/jxY0E</latexit>
<latexit sha1_base64="Wn3278THpwSxwzLAYeJjstf3jaA=">AAAB6HicbVBNS8NAEJ34WetX1aOXYBE8lUQEPRa8eJIW7Ae0oWy2k3btZhN2J0Ip/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemEphyPO+nbX1jc2t7cJOcXdv/+CwdHTcNEmmOTZ4IhPdDplBKRQ2SJDEdqqRxaHEVji6nfmtJ9RGJOqBxikGMRsoEQnOyEp16pXKXsWbw10lfk7KkKPWK311+wnPYlTEJTOm43spBROmSXCJ02I3M5gyPmID7FiqWIwmmMwPnbrnVum7UaJtKXLn6u+JCYuNGceh7YwZDc2yNxP/8zoZRTfBRKg0I1R8sSjKpEuJO/va7QuNnOTYEsa1sLe6fMg042SzKdoQ/OWXV0nzsuJ7Fb9+Va7e53EU4BTO4AJ8uIYq3EENGsAB4Rle4c15dF6cd+dj0brm5DMn8AfO5w/jxY0E</latexit>
Prediction time
Xiang, Yu, et al. "PoseCNN: A Convolutional Neural
Network for 6D Object Pose Estimation in Cluttered
Scenes." arXiv preprint arXiv:1711.00199 (2017).
Position (L2 loss)
ML Projects - metrics
109. Full Stack Deep Learning - UC Berkeley Spring 2021
Example: choosing a metric for pose estimation
• Enumerate requirements
• Downstream goal is real-time robotic grasping
• Position error must be <1cm, not sure exactly how
precise is needed
• Angular error <5 degrees
• Must run in 100ms to work in real-time
109
ML Projects - metrics
110. Full Stack Deep Learning - UC Berkeley Spring 2021
Example: choosing a metric for pose estimation
• Enumerate requirements
• Evaluate current performance
• Train a few models
110
ML Projects - metrics
111. Full Stack Deep Learning - UC Berkeley Spring 2021
Example: choosing a metric for pose estimation
• Enumerate requirements
• Evaluate current performance
• Compare current performance to requirements
• Position error between 0.75 and 1.25cm (depending on
hyperparameters)
• All angular errors around 60 degrees
• Inference time ~300ms
111
ML Projects - metrics
112. Full Stack Deep Learning - UC Berkeley Spring 2021
Example: choosing a metric for pose estimation
• Enumerate requirements
• Evaluate current performance
• Compare current performance to requirements
• Prioritize angular error
• Threshold position error at 1cm
• Ignore run time for now
112
ML Projects - metrics
113. Full Stack Deep Learning - UC Berkeley Spring 2021
Example: choosing a metric for pose estimation
• Enumerate requirements
• Evaluate current performance
• Compare current performance to requirements
• Revisit metric as your numbers improve
113
ML Projects - metrics
114. Full Stack Deep Learning - UC Berkeley Spring 2021
Key points for choosing a metric
A. The real world is messy; you usually care about lots of
metrics
B. However, ML systems work best when optimizing a single
number
C. As a result, you need to pick a formula for combining
metrics
D. This formula can and will change!
114
ML Projects - metrics
115. Full Stack Deep Learning - UC Berkeley Spring 2021
Questions?
115
ML Projects - metrics
116. Full Stack Deep Learning - UC Berkeley Spring 2021
Metrics
Module overview
116
Prioritizing
projects
Lifecycle
Archetypes
Baselines
• How to think about all of the activities in an
ML project
• Assessing the feasibility and impact of your
projects
• The main categories of ML projects, and the
implications for project management
• How to pick a single number to optimize
• How to know if your model is performing
well
ML Projects - baselines
117. Full Stack Deep Learning - UC Berkeley Spring 2021
Key points for choosing baselines
A. Baselines give you a lower bound on expected model
performance
B. The tighter the lower bound, the more useful the baseline
(e.g., published results, carefully tuned pipelines, & human
baselines are better)
117
ML Projects - baselines
118. Full Stack Deep Learning - UC Berkeley Spring 2021
Why are baselines important?
118
ML Projects - baselines
119. Full Stack Deep Learning - UC Berkeley Spring 2021
Why are baselines important?
119
Same model, different baseline —> different next steps
ML Projects - baselines
120. Full Stack Deep Learning - UC Berkeley Spring 2021
Where to look for baselines
120
External
baselines
• Business / engineering requirements
ML Projects - baselines
121. Full Stack Deep Learning - UC Berkeley Spring 2021
Where to look for baselines
121
External
baselines Make sure comparison
is fair!
• Business / engineering requirements
• Published results
ML Projects - baselines
122. Full Stack Deep Learning - UC Berkeley Spring 2021
Where to look for baselines
122
External
baselines
• Business / engineering requirements
• Published results
• Scripted baselines (e.g., OpenCV, rules-based)
Internal
baselines
ML Projects - baselines
123. Full Stack Deep Learning - UC Berkeley Spring 2021
Where to look for baselines
123
External
baselines
• Business / engineering requirements
• Published results
• Scripted baselines (e.g., OpenCV, rules-based)
• Simple ML baselines (e.g., bag of words, linear
regression)
Internal
baselines
ML Projects - baselines
124. Full Stack Deep Learning - UC Berkeley Spring 2021
Where to look for baselines
124
External
baselines
Internal
baselines
• Business / engineering requirements
• Published results
• Scripted baselines (e.g., OpenCV, rules-based)
• Simple ML baselines (e.g., bag of words, linear
regression)
• Human performance
ML Projects - baselines
125. Full Stack Deep Learning - UC Berkeley Spring 2021
How to create good human baselines
125
Low
Quality of
baseline
High
Ease of data
collection
Low
Random people (e.g., Amazon Turk)
Ensemble of random people
Domain experts (e.g., doctors)
Deep domain experts (e.g., specialists)
Mixture of experts
ML Projects - baselines
126. Full Stack Deep Learning - UC Berkeley Spring 2021
How to create good human baselines
• Highest quality that allows more data to be labeled easily
• More specialized domains need more skilled labelers
• Find cases where model performs worse and concentrate data collection
there
126
More on labeling in data lecture!
ML Projects - baselines
127. Full Stack Deep Learning - UC Berkeley Spring 2021
Key points for choosing baselines
A. Baselines give you a lower bound on expected model
performance
B. The tighter the lower bound, the more useful the baseline
(e.g., published results, carefully tuned pipelines, human
baselines are better)
127
ML Projects - baselines
128. Full Stack Deep Learning - UC Berkeley Spring 2021
Questions?
128
ML Projects - baselines
129. Full Stack Deep Learning - UC Berkeley Spring 2021
Metrics
Conclusion
129
Prioritizing
projects
Lifecycle
Archetypes
Baselines
• ML projects are iterative. Deploy something
fast to begin the cycle.
• Choose projects that that are high impact
with low cost of wrong predictions
• The secret sauce to making projects work
well is to build automated data flywheels
• In the real world you care about many
things, but you should always have just one
you’re working on
• Good baselines help you invest your effort
the right way
ML Projects - baselines
130. Full Stack Deep Learning - UC Berkeley Spring 2021
Where to go to learn more
• Andrew Ng’s “Machine Learning Yearning”
• Andrej Karpathy’s “Software 2.0”
• Agrawal’s “The Economics of AI”
• Chip Huyen’s “Introdution to Machine Learning Systems Design”
• Apple’s “Human Interface Guidelines for Machine Learning”
• Google’s “Rules of Machine Learning”
130
ML Projects - conclusion
131. Full Stack Deep Learning - UC Berkeley Spring 2021
Thank you!
131