Quoc le, slides MLconf 11/15/13

•

6 gostaram•5,650 visualizações

MLconf

Tecnologia Educação

Large Scale Deep Learning
Quoc V. Le

Google & CMU

Deep Learning

•  Google is using Machine Learning

•  Machine Learning is difﬁcult

•  Requires domain knowledge from human experts

Deep Learning:

•  Great performances for many problems

•  Works well with a large amount of data

•  Requires less domain knowledge

Focus:

•  Scale deep learning to bigger models and bigger problems

Quoc V. Le

What is Deep Learning?

…

v = g(B u)

B

A

u = g(A x)

x

(images, audio, texts, etc.)

Quoc V. Le

High-level features by Deep Learning

Face detector, Cat detector

…

Edge detectors

Pixels

Quoc V. Le

Google’s DistBelief

Model

Goal: Train deep learning on many
machines

Model: A multiple layered architecture

Forward pass to compute the
features

Backward pass to compute the
gradient

Training Data

Quoc V. Le

Model partition with DistBelief

Model

DistBelief distributes a model across
multiple machines and multiple cores.

Machine (Model Partition)

Training Data

Quoc V. Le

Model partition with DistBelief

Model

DistBelief distributes a model across
multiple machines and cores.

Machine (Model Partition)

Training Data

Core

Quoc V. Le

Model partition with DistBelief

Model

Stochastic Gradient Descent (SGD)

Model parameters are partitioned

Can use up to 1000 cores

Training Data

Quoc V. Le

Model partition with DistBelief

Model

But training is still slow on large data sets

Can we add more parallelism?

Idea: Train multiple models on different
partitions of the data, and merge them

Training Data

Quoc V. Le

Data partition with DistBelief

Parameter Server

∆p

p’ = p + ∆p

p’

Model

Workers

Data

Shards

Quoc V. Le

Parallelism in DistBelief

Model parallelism via model partitioning

Data parallelism via data partitioning and asynchronous communications

DistBelief can scale to billion examples and use 100,000 cores or more

Thanks to its speed, DistBelief dramatically improves many applications

Quoc V. Le

Applications

Voice Search

Photo Search

Text Understanding

Quoc V. Le

Voice Search

Classiﬁer

Hidden layers with 1000s nodes

Speech frame

label!

Quoc V. Le

Cat detector

Front page of New York Times

Quoc V. Le

Seat-belt

Archery

Boston rocker

Shredder

Text understanding

Very useful but also difﬁcult

We should try to understand the meaning of words

Deep Learning can learn the meaning of words

Quoc V. Le

Text understanding

~100-D vector space

Clinton

Paris

Obama

whale

dolphin

Quoc V. Le

Predicting the next word in a sentence

Classiﬁer

Hidden Layers

E

E

E

E

E

the!

Word Matrix

cat!

sat!

on!

the!

is a matrix of dimension ||Vocab|| x d

Quoc V. Le

Visualizing the word vectors

• 

Example nearest neighbors trained on Google News

apple

Apple

iPhone

Relation Extraction

Mikolov, Sutskever, Le. Learning the Meaning behind Words. Google OpenSource Blog, 2013

Quoc V. Le

Summary

Model partition

Data partition

Voice Search

Photo Search

Text Understanding

Quoc V. Le

Joint work with

Kai Chen

Greg Corrado

Rajat Monga

Andrew Ng

Jeff Dean

Matthieu Devin

Paul Tucker

Ke Yang

Samy Bengio, Tom Dean, Josh Levenberg, Geoff Hinton, Tomas
Additional

Mikolov, Mark Mao, Patrick Nguyen, Marc’Aurelio Ranzato,
Thanks:

Mark Segal, Jon Shlens, Ilya Sutskever, Vincent Vanhoucke

Mais conteúdo relacionado

Mais procurados

Deep Learning on Qubole Data PlatformShivaji Dutta

Dato KeynoteTuri, Inc.

Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2OSri Ambati

Analyzing Data With PythonSarah Guido

Introduction To TensorFlowSpotle.ai

Kaz Sato, Evangelist, Google at MLconf ATL 2016MLconf

Data Science at the Command LineHéloïse Nonne

Deep learning with TensorFlowNdjido Ardo BAR

New Capabilities in the PyData EcosystemTuri, Inc.

Deep learning trends준호 김

Webinar: Deep Learning with H2OSri Ambati

SciPy Latin America 2019Travis Oliphant

Deep Learning with CNTKAshish Jaiman

Mais procurados (13)

Deep Learning on Qubole Data Platform

Dato Keynote

Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2O

Analyzing Data With Python

Introduction To TensorFlow

Kaz Sato, Evangelist, Google at MLconf ATL 2016

Data Science at the Command Line

Deep learning with TensorFlow

New Capabilities in the PyData Ecosystem

Deep learning trends

Webinar: Deep Learning with H2O

SciPy Latin America 2019

Deep Learning with CNTK

Semelhante a Quoc le, slides MLconf 11/15/13

How to use transfer learning to bootstrap image classification and question a...Wee Hyong Tok

building intelligent systems with large scale deep learningmustafa sarac

Strata London - Deep Learning 05-2015Turi, Inc.

Deep Learning: a birds eye viewRoelof Pieters

OWF14 - Big Data : The State of Machine Learning in 2014Paris Open Source Summit

OReilly AI Transfer LearningDanielle Dean

Promises of Deep LearningDavid Khosid

"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...Edge AI and Vision Alliance

AI @ Microsoft, How we do it and how you can too!Microsoft Tech Community

Tomáš Mikolov - Distributed Representations for NLPMachine Learning Prague

Video+Language: From Classification to DescriptionGoergen Institute for Data Science

Google Cloud: Data Analysis and Machine Learningn Technologies Andrés Leonardo Martinez Ortiz

Deep learning with tensorflowCharmi Chokshi

Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...MaRS Discovery District

Deep Neural Networks (DNN)Sir Syed University of Engineering & Technology

Week3-Deep Neural Network (DNN).pptxfahmi324663

Deep Learning and the state of AI / 2016Grigory Sapunov

Video AI for Media and Entertainment IndustryAlbert Y. C. Chen

Introducción a NLP (Natural Language Processing) en AzurePlain Concepts

Learn Real World Machine Learning By Building ProjectsJohn Alex

Semelhante a Quoc le, slides MLconf 11/15/13 (20)

How to use transfer learning to bootstrap image classification and question a...

building intelligent systems with large scale deep learning

Strata London - Deep Learning 05-2015

Deep Learning: a birds eye view

OWF14 - Big Data : The State of Machine Learning in 2014

OReilly AI Transfer Learning

Promises of Deep Learning

"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...

AI @ Microsoft, How we do it and how you can too!

Tomáš Mikolov - Distributed Representations for NLP

Video+Language: From Classification to Description

Google Cloud: Data Analysis and Machine Learningn Technologies

Deep learning with tensorflow

Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...

Deep Neural Networks (DNN)

Week3-Deep Neural Network (DNN).pptx

Deep Learning and the state of AI / 2016

Video AI for Media and Entertainment Industry

Introducción a NLP (Natural Language Processing) en Azure

Learn Real World Machine Learning By Building Projects

Mais de MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf

Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf

Josh Wills - Data Labeling as Religious ExperienceMLconf

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf

Meghana Ravikumar - Optimized Image Classification on the CheapMLconf

Noam Finkelstein - The Importance of Modeling Data CollectionMLconf

June Andrews - The Uncanny Valley of MLMLconf

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf

Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf

Neel Sundaresan - Teaching a machine to codeMLconf

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf

Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf

Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf

Mais de MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush

Josh Wills - Data Labeling as Religious Experience

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...

Meghana Ravikumar - Optimized Image Classification on the Cheap

Noam Finkelstein - The Importance of Modeling Data Collection

June Andrews - The Uncanny Valley of ML

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...

Vito Ostuni - The Voice: New Challenges in a Zero UI World

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...

Neel Sundaresan - Teaching a machine to code

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...

Soumith Chintala - Increasing the Impact of AI Through Better Software

Roy Lowrance - Predicting Bond Prices: Regime Changes

Último

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

A Domino Admins Adventures (Engage 2024)Gabriella Davis

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Quoc le, slides MLconf 11/15/13

1. Large Scale Deep Learning Quoc V. Le Google & CMU

2. Deep Learning •  Google is using Machine Learning •  Machine Learning is difﬁcult •  Requires domain knowledge from human experts Deep Learning: •  Great performances for many problems •  Works well with a large amount of data •  Requires less domain knowledge Focus: •  Scale deep learning to bigger models and bigger problems Quoc V. Le

3. Deep Learning •  Google is using Machine Learning •  Machine Learning is difﬁcult •  Requires domain knowledge from human experts Deep Learning: •  Great performances for many problems •  Works well with a large amount of data •  Requires less domain knowledge Focus: •  Scale deep learning to bigger models and bigger problems Quoc V. Le

4. What is Deep Learning? Quoc V. Le

5. What is Deep Learning? … v = g(B u) B A u = g(A x) x (images, audio, texts, etc.) Quoc V. Le

6. What is Deep Learning? … v = g(B u) B A u = g(A x) x (images, audio, texts, etc.) Quoc V. Le

7. High-level features by Deep Learning Face detector, Cat detector … Edge detectors Pixels Quoc V. Le

8. Google’s DistBelief Model Goal: Train deep learning on many machines Model: A multiple layered architecture Forward pass to compute the features Backward pass to compute the gradient Training Data Quoc V. Le

9. Model partition with DistBelief Model DistBelief distributes a model across multiple machines and multiple cores. Machine (Model Partition) Training Data Quoc V. Le

10. Model partition with DistBelief Model DistBelief distributes a model across multiple machines and cores. Machine (Model Partition) Training Data Core Quoc V. Le

11. Model partition with DistBelief Model Stochastic Gradient Descent (SGD) Model parameters are partitioned Can use up to 1000 cores Training Data Quoc V. Le

12. Model partition with DistBelief Model But training is still slow on large data sets Can we add more parallelism? Idea: Train multiple models on different partitions of the data, and merge them Training Data Quoc V. Le

13. Data partition with DistBelief Parameter Server ∆p p’ = p + ∆p p’ Model Workers Data Shards Quoc V. Le

14. Parallelism in DistBelief Model parallelism via model partitioning Data parallelism via data partitioning and asynchronous communications DistBelief can scale to billion examples and use 100,000 cores or more Thanks to its speed, DistBelief dramatically improves many applications Quoc V. Le

15. Applications Voice Search Photo Search Text Understanding Quoc V. Le

16. Voice Search Classiﬁer Hidden layers with 1000s nodes Speech frame label! Quoc V. Le

17. Voice Search Quoc V. Le

18. Applications Voice Search Photo Search Text Understanding Quoc V. Le

19. Photo Search

20. Cat detector Front page of New York Times Quoc V. Le

21. Seat-belt Archery Boston rocker Shredder

22. Face Amusement, Park Hammock

23. Google+ PhotoSearch

24. Applications Voice Search Photo Search Text Understanding Quoc V. Le

25. Text understanding Very useful but also difﬁcult We should try to understand the meaning of words Deep Learning can learn the meaning of words Quoc V. Le

26. Text understanding ~100-D vector space Clinton Paris Obama whale dolphin Quoc V. Le

27. Predicting the next word in a sentence Classiﬁer Hidden Layers E E E E E the! Word Matrix cat! sat! on! the! is a matrix of dimension ||Vocab|| x d Quoc V. Le

28. Visualizing the word vectors •  Example nearest neighbors trained on Google News apple Apple iPhone

29. Relation Extraction Mikolov, Sutskever, Le. Learning the Meaning behind Words. Google OpenSource Blog, 2013 Quoc V. Le

30. Machine Translation Quoc V. Le

31. Summary Model partition Data partition Voice Search Photo Search Text Understanding Quoc V. Le

32. Joint work with Kai Chen Greg Corrado Rajat Monga Andrew Ng Jeff Dean Matthieu Devin Paul Tucker Ke Yang Samy Bengio, Tom Dean, Josh Levenberg, Geoff Hinton, Tomas Additional Mikolov, Mark Mao, Patrick Nguyen, Marc’Aurelio Ranzato, Thanks: Mark Segal, Jon Shlens, Ilya Sutskever, Vincent Vanhoucke

Quoc le, slides MLconf 11/15/13

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (13)

Semelhante a Quoc le, slides MLconf 11/15/13

Semelhante a Quoc le, slides MLconf 11/15/13 (20)

Mais de MLconf

Mais de MLconf (20)

Último

Último (20)

Quoc le, slides MLconf 11/15/13