SlideShare uma empresa Scribd logo
1 de 107
Deep Learning in the Real World
Lukas Biewald
@L2K
CrowdFlower
O’Reilly
Excitement Around Deep Learning
Machine Learning vs Statistics Glossary
(Robert Tibshirani)
Machine Learning Statistics
Learning Fitting
Generalization Test Set Performance
Supervised Learning Regression, Classification
Unsupervised Learning Density estimation, clustering
large grant = $1,000,000 large grant = $50,000
nice place to have a meeting:
Snowbird, Utah, French Alps
nice place to have a meeting:
Las Vegas in August
Venture Capital Investment in Deep Learning
https://medium.com/startup-grind
Inevitable Backlash
What’s Actually Working?
Image Recognition
Medical
What are the Challenges?
Proprietary and Confidential - Do Not Distribute
Human Generated Code vs Machine Generated Code
2001 Space Odyssey
https://www.youtube.com/watch?v=MzIQUDQO-ag
Machine Learning Projects are Really Hard to Manage
Proprietary & Confidential34
Proprietary & Confidential35
Kaggle Accuracy
0%
18%
35%
53%
70%
Baseline 12-May 13-May 14-May 15-May
Accuracy
Accuracy of Best Performing Model
Proprietary & Confidential36
Kaggle accuracy over time
0%
20%
40%
60%
80%
13-May 14-May 15-May 16-May 17-May 18-May 19-May 31-May 16-Jun 1-Jul 7-Jul
Accuracy
Accuracy of the Best Performing Model
Proprietary & Confidential37
Kaggle Participation
0
350
700
1050
1400
13-May14-May15-May16-May17-May18-May19-May31-May 16-Jun 1-Jul
Number of Participating Teams
Proprietary & Confidential38
Netflix Prize
Self Driving Cars - Close or Far?
Machine Learning Can Be Unpredictable and
Opaque
Image Classification Success
Image Classification Errors
Image Classification Errors
Image Classification Errors
Alpha Go’s Mistake
Criminal Risk Scores
Explainability of Neural Networks
Deep Learning Can Be Vulnerable to Hacking
https://github.com/jmgilmer/AdversarialMNIST
Glasses Fooling Face Recognition
Machine Learning Requires Training Data
The Effect of Better Algorithms
The Effect of Better Features
The Effect of More Data
The Effect of Cleaner Data
Where Do Data Scientists Spend Their Time?
Proprietary and Confidential - Do Not Distribute
CrowdFlower AI Platform: Training Data
• Multiple use
cases
• Multiple data
formats
• Templatized
workflow
Training
Data
Human-in-
the-loop
Machine
Learning
Proprietary and Confidential - Do Not Distribute
CrowdFlower AI Platform: Training Data
• Image labeling for
self driving cars
• Pixel-level
categorization
done by machine
and humans
Training
Data
Human-in-
the-loop
Machine
Learning
???
The Combination of Humans and Computers is Powerful
Advanced Chess
AI
Classifier Output
Human in the Loop
Confident
Confident
Output
Human
Annotation
AI
Classifier
Human in the Loop
Output
Active Learning
Human
Annotation
ConfidentAI
Classifier
Human in the Loop
United States Postal Service (1982)
Proprietary and Confidential - Do Not Distribute
CrowdFlower AI Platform: Case Study
Training Data
Human-in-the-
loop
Machine
Learning
400,000
structured support
tickets create initial
ML model
200,000 new
support tickets per
week fed into ML
model
40% initial output
by model; 60%
handled by human
review
Machine Learning Can Look at Far More Data than
Humans
The Unreasonable Effectiveness of Data Revisited
(Google Blog 2017)
Breakthroughs and Data Sets
Alexander Wissner-Gross
Massive Free Datasets
Audio Set
Transfer Learning is the Future
New Data Sets
Freiburg Groceries Data Set
Inception
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/object_loca
Visualizing Deep Learning Networks - Layer 1
https://blog.keras.io/how-convolutional-neural-networks-see-the-world.html
Visualizing Deep Learning Networks - Layer 2
https://blog.keras.io/how-convolutional-neural-networks-see-the-world.html
Visualizing Deep Learning Networks - Layer 3
https://blog.keras.io/how-convolutional-neural-networks-see-the-world.html
Visualizing Deep Learning Networks - Layer 4
https://blog.keras.io/how-convolutional-neural-networks-see-the-world.html
Visualizing Deep Learning Networks - Layer 5
https://blog.keras.io/how-convolutional-neural-networks-see-the-world.html
Using DNNs as feature extractors
Retraining Neural Networks (Fine Tuning)
Fine Tuning Accuracy Improvements
77% Accuracy With
Fine Tuning
47% Accuracy
Without Fine Tuning
https://shuaiw.github.io/2017/03/09/smaller-faster-deep-learning-models.html
Dermatologist-level classification of skin cancer with
deep neural networks
Multi Task Learning
https://sorenbouma.github.io/blog/oneshot/
One-Shot Learning
Synthetic Training Data
Thank You
Lukas Biewald (@L2K)

Mais conteúdo relacionado

Mais procurados

Y conf talk - Andrej Karpathy
Y conf talk - Andrej KarpathyY conf talk - Andrej Karpathy
Y conf talk - Andrej Karpathy
Sze Siong Teo
 

Mais procurados (20)

Materials for getting started with data science
Materials for getting started with data scienceMaterials for getting started with data science
Materials for getting started with data science
 
Machine Learning for Non-Technical People - Turing Fest 2019
Machine Learning for Non-Technical People - Turing Fest 2019Machine Learning for Non-Technical People - Turing Fest 2019
Machine Learning for Non-Technical People - Turing Fest 2019
 
AI 4 Institution Leaders_Feb 2019
AI  4 Institution Leaders_Feb 2019AI  4 Institution Leaders_Feb 2019
AI 4 Institution Leaders_Feb 2019
 
Artificial intelligence & machine learning landscape
Artificial intelligence & machine learning landscapeArtificial intelligence & machine learning landscape
Artificial intelligence & machine learning landscape
 
Demystifying AI
Demystifying AIDemystifying AI
Demystifying AI
 
2018 Princeton Fintech & Quant Conference: AI, Machine Learning & Deep Learni...
2018 Princeton Fintech & Quant Conference: AI, Machine Learning & Deep Learni...2018 Princeton Fintech & Quant Conference: AI, Machine Learning & Deep Learni...
2018 Princeton Fintech & Quant Conference: AI, Machine Learning & Deep Learni...
 
Novi sad ai event 1-2018
Novi sad ai event 1-2018Novi sad ai event 1-2018
Novi sad ai event 1-2018
 
Ai ml-demystified-mwux2017-final-171016011705
Ai ml-demystified-mwux2017-final-171016011705Ai ml-demystified-mwux2017-final-171016011705
Ai ml-demystified-mwux2017-final-171016011705
 
Machine Learning Introduction for Digital Business Leaders
Machine Learning Introduction for Digital Business LeadersMachine Learning Introduction for Digital Business Leaders
Machine Learning Introduction for Digital Business Leaders
 
Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017
 
Quantitative Ethics - Governance and ethics of AI decisions
Quantitative Ethics - Governance and ethics of AI decisionsQuantitative Ethics - Governance and ethics of AI decisions
Quantitative Ethics - Governance and ethics of AI decisions
 
JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...
JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...
JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...
 
Intelligence Augmentation - The Next-Gen AI
Intelligence Augmentation - The Next-Gen AIIntelligence Augmentation - The Next-Gen AI
Intelligence Augmentation - The Next-Gen AI
 
Artificial Intelligence Introduction & Business usecases
Artificial Intelligence Introduction & Business usecasesArtificial Intelligence Introduction & Business usecases
Artificial Intelligence Introduction & Business usecases
 
Machine Intelligence - Wie Systeme lernen und unseren Alltag verändern
Machine Intelligence - Wie Systeme lernen und unseren Alltag verändernMachine Intelligence - Wie Systeme lernen und unseren Alltag verändern
Machine Intelligence - Wie Systeme lernen und unseren Alltag verändern
 
Top 5 Deep Learning and AI Stories - April 20, 2018
Top 5 Deep Learning and AI Stories - April 20, 2018Top 5 Deep Learning and AI Stories - April 20, 2018
Top 5 Deep Learning and AI Stories - April 20, 2018
 
Y conf talk - Andrej Karpathy
Y conf talk - Andrej KarpathyY conf talk - Andrej Karpathy
Y conf talk - Andrej Karpathy
 
Machine Learning for Designers
Machine Learning for DesignersMachine Learning for Designers
Machine Learning for Designers
 
Introduction to AI & ML
Introduction to AI & MLIntroduction to AI & ML
Introduction to AI & ML
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 

Semelhante a Deep Learning in the Real World

Semelhante a Deep Learning in the Real World (20)

Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
 
Best practices in building machine learning models in Azure ML
Best practices in building machine learning models in Azure MLBest practices in building machine learning models in Azure ML
Best practices in building machine learning models in Azure ML
 
The Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine LearninThe Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine Learnin
 
AI Happy Hour - Dr. Kai-Fu Lee - The Golden age of Artificial Intelligence
AI Happy Hour - Dr. Kai-Fu Lee - The Golden age of Artificial IntelligenceAI Happy Hour - Dr. Kai-Fu Lee - The Golden age of Artificial Intelligence
AI Happy Hour - Dr. Kai-Fu Lee - The Golden age of Artificial Intelligence
 
IIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data ScienceIIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data Science
 
MIT Deep Learning Basics: Introduction and Overview by Lex Fridman
MIT Deep Learning Basics: Introduction and Overview by Lex FridmanMIT Deep Learning Basics: Introduction and Overview by Lex Fridman
MIT Deep Learning Basics: Introduction and Overview by Lex Fridman
 
When AI becomes a data-driven machine, and digital is everywhere!
When AI becomes a data-driven machine, and digital is everywhere!When AI becomes a data-driven machine, and digital is everywhere!
When AI becomes a data-driven machine, and digital is everywhere!
 
Whats Next for Machine Learning
Whats Next for Machine LearningWhats Next for Machine Learning
Whats Next for Machine Learning
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
AI Introduction: AI is the new electricity (by Slash)
AI Introduction: AI is the new electricity (by Slash)AI Introduction: AI is the new electricity (by Slash)
AI Introduction: AI is the new electricity (by Slash)
 
Data Culture Series - Keynote - 24th feb
Data Culture Series - Keynote - 24th febData Culture Series - Keynote - 24th feb
Data Culture Series - Keynote - 24th feb
 
EDW 2015 cognitive computing panel session
EDW 2015 cognitive computing panel session EDW 2015 cognitive computing panel session
EDW 2015 cognitive computing panel session
 
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
 
Auto ai for skillsfuture
Auto ai for skillsfuture Auto ai for skillsfuture
Auto ai for skillsfuture
 
Data Science-Why?What?How? By Hari Prasad
Data Science-Why?What?How? By Hari PrasadData Science-Why?What?How? By Hari Prasad
Data Science-Why?What?How? By Hari Prasad
 
Starting AI tomorrow: are you ready? - Christel Schoger (Google)
Starting AI tomorrow: are you ready? - Christel Schoger (Google)Starting AI tomorrow: are you ready? - Christel Schoger (Google)
Starting AI tomorrow: are you ready? - Christel Schoger (Google)
 
FROM BI TO APPLIED AI
FROM BI TO APPLIED AIFROM BI TO APPLIED AI
FROM BI TO APPLIED AI
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data Science
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a profession
 
Understanding the New World of Cognitive Computing
Understanding the New World of Cognitive ComputingUnderstanding the New World of Cognitive Computing
Understanding the New World of Cognitive Computing
 

Último

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Último (20)

Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 

Deep Learning in the Real World

Notas do Editor

  1. I’m Lukas Biewald and I want to talk about Deep Learning in the Real World. What's actually working now and the problems we're likely to face in the next few years as it becomes more and more ubiquitous.
  2. The company I founded CrowdFlower, is a san francisco startup that’s helped companies like Bloomberg, Salesforce, Google, Coca-Cola, Home Depot and build and deploy deep learning systems so I’ve seen a lot of businesses succeed and fail.
  3. I also build robots in my basement, which keep me close to the applications of deep learning.
  4. I have to show you one of my projects- this a drone I built that recognizes chris and understands voice commands
  5. (5 minutes) Let’s talk about the excitement maybe hype around deep learning
  6. My Brother in law is a statistics professor and machine learning drives him crazy. He says machine learning takes statistical concepts, renames them and markets them. Deep learning drives him even crazier. There’s certainly truth to this table by Tibshirani.
  7. Gartner has a hype cycle curve and deep learning is right at the top of it.
  8. Venture investment s probably the best indicator of a hype bubble and it’s certainly growing exponentially over the past few years.
  9. Backlash articles have started to appear, maybe particularly with Watson, who has been making some especially bold claims.
  10. But I’m here to talk about deep learning in the real world so lets talk about what’s actually working. The set of real world applications is vast and strange. (10 minutes)
  11. One of the reasons deep learning is so exciting is this graph. In 2012 we saw massive improvement in image rcognition.
  12. I tried this in my garage labeling videos of cars outside in realtime.
  13. Speech recognition has had the same set of step function improvements which have led to products like the amazon echo and siri.
  14. So what do companies actually do with these algorithms? One thing they do is structure data. Over 80% of social media posts now contain images, this makes understanding who is talking about your brand much, much harder. You can’t just match keywords.
  15. A completely different application of the same technology is identifying skin cancer from images.
  16. Coca cola recently announced that they’ve been using deep learning to recognize rewards codes from bottle caps. This isn’t the hardest problem but there are many different cameras, lighting conditions, etc.
  17. Face recognition is now near human level performance as anyone that uses facebook will know. Computers can now recognize individual people along with mood and even if someone is lying.
  18. This is a doorbell I built that recognizes my friends.
  19. We can now recognize deforestation and measure climate impact in satellite photos at massive scale
  20. We track elephants to keep them safe from poachers.
  21. The TSA is starting to experiment with deep learning and has found that it can actually be more reliable than humans for detecting weapons.
  22. Deep learning is used to check handbags and decide if they are counterfit or now.
  23. Blue river builds tractors that can autamatically detect and kill weeds. They were a crowdflower customer recently bought by john deere for over $300M.
  24. Semantic segmentation can take images and just put bounding boxes around items but attach every pixel to a meaning such as tree or road or pedestrian.
  25. Real companies now use deep learning and video cameras or robots to check the placement of their products on shelves.
  26. 3dsignals build a system that recognizes mechanical failure by listening to engines.
  27. Researchers have started to build diagnostic systems for cancer just by looking at the shape and distribution of cells. Humayun did this.
  28. A whole ecosystem of companies has sprung up, including my company and Kaggle which was recently bought by google
  29. (15 minutes) But most companies really struggle to productize their Machine Learning – why does this happen?
  30. We have tensor flow, we have keras, why is it still hard to make deep learning projects work? I think it’s because machine generated code is fundamentally different than human generated code. On the left is some code I wrote and on the right is a model I built. If you’re a programmer you can understand the left but no one can understand the right. Other huge differences: 1) human code is at most 1mb. Machine generated code is 1gb and getting bigger. 2) human code has meaningful diffs. If I release a v2 I might edit 5% of this code. 3) We have 50 years experience debugging human code. Machine generated code is coming and it’s replacing human generated code everywhere, so how do we make it work?
  31. How do we know what’s easy and what’s hard?
  32. (20 minutes)
  33. I started a kaggle competiton a while back.
  34. In the first three days.
  35. After a week
  36. Netflix prize had the same phenomenon
  37. (25 minutes) We’re used to computers being predicatble and reliable and explainable. But deep learning is not
  38. Image net classification is spectacular on the kinds of images it was trained on.
  39. Here are some cases where it makes mistakes, why?
  40. Alpha Go crushed the best human go player, but it made on really bad move recognizable to amatures
  41. When machine learning gets into unfamiliar situations it can completely fall apart. My old lab made a helicopter fly upside down. But it took years. At first every time it got into trouble it would crash out of the sky. It turned out it needed to be trained on bad pilots that would consistently get into tricky situations.
  42. When the tesla autopilot resulted in deaths, our government asked for the code. With older style controllers its obvious where the fault lies, you can step through the code. But with ML it’s less clear – with just the code and not the data it was trained on, what does it tell you? Who is at fault – the team that wrote the code? Or the team that trained the model.
  43. Propublica research – risk of a criminal likely to be a repeat offender. Not a real “deep learning” model but this will be soon.
  44. One way to do explanations.
  45. (30 minutes) This might feel like science fiction but it’s a real issue. When you have a brain and you can run experiments on it, it can be very vulnerable to hacking.
  46. On the left the model is predicting panda. A tiny bit of noise is added. On the right the model is predicting gibbon. This might only happen on one in a trillion images, but because our code is essentially a formula, we can systematically find the images that will mess it up.
  47. On the left an 8. On the right it thinks the image is 5.
  48. Researchers have applied this to the real world. Adding custom glasses to the subject on the tops face, the facial recognition system thinks it’s the people on the bottom.
  49. Street signs and self driving cars may become a huge issue. By adding a little noise to a street sign, the deep learning can be tricked. Does this mean deep learning is bad? Would our brains be just as vulnerable if we could run millions of experiments on them?
  50. (35 minutes) Training data is essential to deep learning.
  51. Peter Norvig, head of research at Google, observed this in 2004 in his famous paper The Unreasonable Effectiveness of Data. I’ll quote him – circa 10 years ago: The biggest successes in natural-language-related machine learning have been speech recognition And machine translation. The reason for these successes is not that these tasks are easier than other tasks; they are in fact much harder … The reason is that a large training set is available to us in the wild. In other words the use cases that work, work because there’s lots of data available.
  52. Norvig wasn’t the first to notice this phenomenon. Banko and Brill at Microsoft research actually observed the same thing as Norvig a few years before by measuring the accuracy of several different machine learning algorithms what seemed like at the time a wide range of training data sets 100,00 words to a billion words. They saw an effect that we see all the time, where the algorithms are very similar in accuracy at every training set size, but consistently increase with more data. It says something amazing about our industry that a paper 10 years old feels almost like an anachronism, but a billion words feels like a tiny data set today
  53. A crucial piece of what my company crowdflower does is help companies build training data sets. We give our customers templates.
  54. Collecting training data can be tough, like with this interface labelin every pixel in an image.
  55. If you look at these particular algorithms they seem like they might be flattening out. Did the accuracy curve flatten out at bigger data sets? What happened? A lot of you probably know. My old professor Andrew Ng (another one of the CrowdFlower) inspirations has an answer in one of his famous slides
  56. Deep learning has allowed the trend to continue. Deep learning models are able to ingest and use even more data than models in the past.
  57. (40 minutes) So why are companies using deep learning with all these flaws? At the highest level I think it’s because deep learning is a different kind of intelligence and the combination of human and computers is really powerful.
  58. 20 years ago deep blue beat gary kasparov. There’s a game called advanced chess.
  59. There’s a simple, powerful design pattern behind a huge fraction of the successful deployments of deep learning. Human in the loop is simple and almost all of our customers that really use deep learning use some form of it. In the real world, deep learning algorithms make mistakes and we have to be able to deal with those mistakes. At its most basic you take the result of a classifier and use the output where its confident. Say we have a document classification algorithm. If it’s more than 99% sure that there is a pedestrian it decides there is a pedestiran.
  60. But if the algorithm is not confident it sends the output to a human operator to label. The business process can then work at a very high degree of confidence even if the algorithm is less than 100% accurate.
  61. Very importantly, the human annotation can be sent back to the algorithm to make it better. This process is called active learning. “Active Learning” is sending the human labels back to the classifier for retraining. The human labels can be reused used to improve the machine learning classifier over time. So less and less results are sent to a human and your business process becomes more and more efficient.
  62. The us post office has been doing this since 1982. They now have 99.5% accuracy but there’s still 0.5% of letters that go to a human. Without the human in the loop design patter
  63. Coca cola recently announced that they’ve been using deep learning to recognize rewards codes from bottle caps. This isn’t the hardest problem but there are many different cameras, lighting conditions, etc.
  64. So where there’s a mitake, they have the human check that the code was entered correctly.
  65. In the cases where the
  66. To give you a real world example of a crowdflower customer
  67. (45 minutes) Humans get bored and machines don’t. Computers can ingest more training data than a human can in millions of lifetimes. This lets deep learning do things that humans never could.
  68. I’ve worked on human-in-the-loop and training data for over 10 years. Why do I care so much about this? Obviously correlation doesn’t imply causation but major breakthroughs seem to to come just after the data set is made available and long after the algorithm is invented. This is a table showing just that. Speech recognition came long after hidden markov models but just after a wall street journal corpus. Google’s object classification came long after neural networks but just after the collection of imagenet, the first large image corpus. We want to help you make breakthroughs by giving you crucial data sets. And that ‘s why the think our work is so important. People might not talk as much about training data but it’s the engine that drives innovation.
  69. (50 minutes) Where are things going? I believe the most important development that everyone should know about is transfer learning.
  70. Your data set probably isnt in imagenet
  71. This is a little robot I built that recognized objects
  72. Deep learning models are typicall modeled after the human visual cortex and build in layers. The pixels come in the left and the predictions leave out the right. Each layer recognized progressively more complicated features.
  73. For example here’s what the first layers is seeing.
  74. 129,450 clinical images—two orders of magnitude larger than previous datasets12—consisting of 2,032 different diseases.
  75. This is a robot trained on simulations. Simulations can generate infinite amounts of training data. But computers are incredibly good and finding and exploiting flaws in the simulator. We need the simulations to be as varied and messy as the world we live in.
  76. But we’re working on itIn the future machine learning may be deeply tied to effectively simulating the real world. It sounds like science fiction but it’s really 3-5 years away. Unfortunately we don’t really know how to simulate things as simple as towel folding. Simulation could actually become the most important field of ML.