SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
Taking the Road Less Travelled:
In pursuit of a Multi-modal
experience for Bixby
Samsung R&D Bangalore, India
Dr. Vikram Vij
vikram.v@samsung.com
Intelligent Assistants are fast emerging as the next breakthrough
user interface
1990s
Web
2000s
Apps
Today
Assistants
Images references form
Evolution of Human Computer Interface
GUI
(~1980s)
Touch UI
(~2000)
Voice
(2011)
Bixby
(2017)
Changes of Interface Paradigm
Voice Assistant Market Research Report
Global Forecast 2023
Reference : https://www.marketresearchfuture.com/reports/voice-assistant-market-4003
Bixby Introduction
Bixby is an intelligent, personalized voice interface for your phone.
Its multi-modal - lets you seamlessly switch between voice and touch modes.
o Launch Date : 19th July 2017 (US), 22nd Aug (Global)
o Available in more than 200 countries
o More than 75 Domains supported (Camera, Gallery, Messages, WhatsApp, Youtube, Uber etc. )
o More than 27 million registered users
http://bixby.samsung.com/meet-bixby
https://www.youtube.com/watch?v=dbmVtseEjo4&index=1&list=PLrV44rSVouDcbvky1f77mUjWLCq8WI-Z1
https://www.youtube.com/watch?v=Gcd4NpK2fTI
Bixby Live Demo
Bixby Overview
Supporting every task of
the application
Understanding the current
context and state of app
Find an
umbrella photo
Manual editing
VOICE
TOUCH
VOICE
1
2
3
Understanding commands
with incomplete info
Send this photo
via message
To whom?
To Jane
Done
“Incomplete Command”
A true one click action
- Turn on
- Authenticate
- Unlock
- Wake the phone
- Execute the command
Supporting Samsung’s
native apps
……
Request
incomplete.
Error
“Show me the Wi-Fi data
usage”
Press &
Hold
Bixby is fundamentally different from other voice agents or
assistants in the market because of its ..
Post it on
Instagram
Completeness Context Awareness Cognitive Tolerance Frictionless
Bixby - Cognitive Tolerance
ASRIncomplete or inaccurate instructions are also performed under the context..
Bixby | Human Computer Interface Revolution
With English Support, Samsung's Bixby Impresses Vs. Siri And
Google Assistant
Bixby is perhaps in the most precarious spot, as it’s going to be
competing directly against Google Assistant on some devices. Bixby’s
capabilities sound quite impressive thanks to its integration with
other Samsung apps
Galaxy S8's voice sidekick can do things Siri can't
Bixby v1.0: Minimalistic View
ASR
NLU
voice packet
text input
command
ASR
ASR: Automatic Speech Recognition
NLU: Natural Language Understanding
Traditional NLU Flow
NLU
Platform
mom
Text to Mom Machine Learning Models
Command
Domain
Classifier
Intent
Classifier
Slot
Tagger
Messages Send Message “Mom”
Key Challenges
Design
oText and Voice : Co-existence of Dual Modality
oRepresentation of Massive Input Space
oManagement of Massive Context
oHandling of Variable Output Space
oDesign of Deep Learning Architecture to Achieve this
Data
oManaging the distribution and variations of data
oBalance of Data to maintain the expected distribution of data across different
classes
oSpecial handling for rejection Data
Bixby: The Multi-Modal Point of View
① Home ② Settings ③ Connections ③ Data Usage
Touch Interface Voice Interface
+
“Show me the mobile data usage”
Bixby: The Multi-Modal Point of View (cont’d)
Touch UI
Screen Flow
Voice UI
“Find Hawaii photos in Gallery”
Context Context Context Context
“find James” in Contacts application => contact information of James
“find James” in Gallery application => images tagged as James
Leap Required for NLU toward Multi-Modality
Traditional
NLU
Multi-Modal
NLU
Context
Awareness
Massive Number of Contexts Varying Set of Commands
…
…
…
…
…
Thousands of states
Note8 …
…
…
…
…
…
S8
TabS
Various device models,
apps, locales, …
Input Space = (2,000 Contexts) x (Utterances for 6,000 commands)
Challenge of Massive Contextual Input Space
“Find James”+
Picture View Context
“Find James”+
Contact View Context
James’ Picture
James’ Contact
…
Static
Classifier
Static
Classifier
Static
Classifier
Static
Classifier
…
…
…
…
…
…
6000+ command classes
Context Space
2000+ contexts
Deep Learning was chosen instead of SVMs, Random Forest etc.
• Massive number of Classes
• Approximately 60 Classes for Domains
• Approximately 6K Classes for Intents
• Closeness of Domains
• The nature of classes are similar
• Examples: Reminder, Calendar and Clock
• Huge Data
• 10M data for Domain Classification
• 1.5M data per Intent Classification (on average per Domain)
Motivation for Deep Learning
Domain
Classification
Intent
Classification
Slot Tagger
Utterance
… …
… …
Slots
Domain Label
Intent Label
Approach for Massive Contextual Input Space
Context-conditioned DNN classifier + Sampling
Context-Aware
DNN Classifier
Sampling
6000+ commands
Context + Utterance
context_α utterance_b  command_1+
context_α utterance_c  command_2+
…
context_α utterance_a  command_1+
context_β utterance_b  command_2+
context_β utterance_c  command_2+
…
context_β utterance_a  command_1+
…
…
…
Training Set
Input Output
Hierarchical classifier
Session based architecture
Rejection Logic in Intent
• RNN word model had difficulty in:
• Handling unknowns (word misspellings)
• Learning word inflections (word boundary going beyond representation)
• State based learning
• So switched to CNN character model
Challenge of RNN vs CNN
~~~ utt ~~
~~~ utt ~~
.
.
.
~~~ utt ~~
vs
e.g. “search for s8 plus” goes to calculator domain
e.g. Settings Bluetooth Screen : “turn off please”
Issue : State is not learnt (Wifi off is detected)
• Determining the Optimal Filter Size
• Smaller filter size used for sub-word level features
• Larger filter size used for understanding language structures
Challenge of CNN Filter Size
Multiple filters with various sizes work in parallel
Final layer of CNN which gives best output
Reference : hackerearth.com
Challenge of Variable Output Space
App VersionDevice Models Locale
India V 1.1
…
…
…
Turn on Bluetooth tethering
Turn on USB tethering
Turn on tethering
Note8 …
…
…
…
…
…
S8
TabS
Model A
Model B
Approach for Variable Output Space
Version Management Mechanism for NLU Engine
Note 8
Country
Installed app info
OS version
Version Metadata
…
Version mask vectors
V1 …
…
…
…
…
…
V2
V3
Device
Sever
Version DB
NLU Core
Command
Classification
Key Learnings - Design
• Need to experiment with various DNN Architectures & parameters – make
sure experiments have a rationale
• Obvious choice of DNN may not work the best – in text RNNs typically used
but CNNs proved to be better
• Hierarchical design may work better (e.g. text classification)
• Feature based matching for intent classes where 100% accuracy is needed
• Rule-Based Matching of NER instead of ML/DL based NER
• Rejection Based Intent Classification for Close Domains
• Can abstract out complexity where possible (e.g. variable output space)
Massive Data Flow
Synthetic
Generation of Data
Purchased (3rd Party )
Data
Crawled Data for
Out of Domain
Voice of Customer Data
Quick Grammar Data
DC
Bucketed and annotated
for Single Intent Class
DC and Intent Separated
by Class Levels
Bucketed by Single Intent
Class
Special Data
Market Issues & Bug Fixes
for Intent and Domain
Sampled 2K/Class
Hand-cleanedandConsumedTotal
Hand-cleaned&DownSampled
Sampled2K/Class
ServiceAPILayer
Intent Slot
Sampled 10- ~ 20K/Class
Sampled 10- ~ 20K/Class
Hand-cleaned & Down Sampled
Data Governance – Training Data
Used Tools to detect & resolve data conflicts across
Domains & Intents
• TF-IDF based tool
• Cosine similarity based tool
Data Governance – Test Data
Unit Testing Automation E2E Testing Automation
In- House Automated
Unit Test Tool for
Domain , Intent and
Slot
DEV
Server
Accepted ? Accepted ?
STG
Server
Accepted ?
PRD
Server
Development and
Management of Data
Analysis based on Data Governance Tool
Y Y Y
NNN
End User
VOC Issues
Key Learnings - Data
• Managing the distribution and variations of data is essential
• Quality of Data is critical
o Balance of Data to maintain the expected distribution of data across different classes
o Special handling for rejection Data
• A Deep Learning Engineer / Data Scientist must spend 30% of his or her time in
looking at the data
• People are needed to manage this volume of data
• Tools / Automation need to be developed for pre-processing of data
• We can not avoid hand-cleaning or hand-engineering of data
• Obvious need for Data Governance as well as Continuous Monitoring of product
quality.
• The NLP / ML driven project cycle (including data) is quite different from
conventional SW project cycle
ASR: Challenge of Speech
Is different for every speaker
May be fast, slow, or varying in speed
May have high pitch, low pitch, or be whispered
Has widely-varying types of environmental noise
Changes depending on sequence of phonemes
Changes depending on speaking style
May not have distinct boundaries between units
Changes depending on the semantics of the utterance
Has an unlimited number of words
Bixby ASR - Fundamentals
Language
Model(s)
voice packet
Feature
Extraction Decoder
Acoustic
Model(s)
ASR
System
ASR
Hypothesis
Inverse
Text
Normalization
• Acoustic Model
• Links Acoustics to Word/phoneme sequence
• Estimates the likelihood of acoustic sequence given a
word/phoneme (LSTM)
• Language Model
• Prior on word sequences
• Probability of a word given the preceding words (n-gram)
• Decoder
• Find the best word sequence, i.e. searching for the
lowest-cost path in a graph
• Uses Viterbi algorithm (dynamic programming)
Bixby ASR - Fundamentals
Bixby ASR – Multi Accent
United States China
India
United Kingdom
SpainSouth Korea
DEFAULT ACCENTED
On-Boarding
Utterances
SIM Card
Information
Keyboard
Language
Contact
Details
Accent Determination
Based on:
Australia
Canada
Challenge for Indian Market
• Hindi targeted as language of experimentation.
• Indian Languages e.g. Hindi is used in conjunction with English
e.g. camera खुला करो
• We have developed bi-lingual (English + Hindi) model for Hindi classifier

Mais conteúdo relacionado

Mais procurados

AI and the Future.pptx
AI and the Future.pptxAI and the Future.pptx
AI and the Future.pptxJeffOHara9
 
AI in Business: Opportunities & Challenges
AI in Business: Opportunities & ChallengesAI in Business: Opportunities & Challenges
AI in Business: Opportunities & ChallengesTathagat Varma
 
AI and machine learning
AI and machine learningAI and machine learning
AI and machine learningITU
 
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?Bernard Marr
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 
Application of expert system
Application of expert systemApplication of expert system
Application of expert systemDinkar DP
 
Artificial Intelligence - What Social Work Educators Need to Know
Artificial Intelligence  - What Social Work Educators Need to KnowArtificial Intelligence  - What Social Work Educators Need to Know
Artificial Intelligence - What Social Work Educators Need to KnowStefanie Panke
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning ProjectEng Teong Cheah
 
Computational intelligence in wireless sensor network
Computational intelligence in wireless sensor network Computational intelligence in wireless sensor network
Computational intelligence in wireless sensor network KratikaNigam3
 
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete DeckAI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete DeckSlideTeam
 
Google BARD v/s ChatGPT _ A review
Google BARD v/s ChatGPT _ A reviewGoogle BARD v/s ChatGPT _ A review
Google BARD v/s ChatGPT _ A reviewDR. Ram Kumar Pathak
 
Predictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data miningPredictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data miningSAS Asia Pacific
 
Low Code Neuro-Symbolic Agents.pdf
Low Code Neuro-Symbolic Agents.pdfLow Code Neuro-Symbolic Agents.pdf
Low Code Neuro-Symbolic Agents.pdfDenis Gagné
 
Tokenization using nlp | NLP Course
Tokenization using nlp | NLP CourseTokenization using nlp | NLP Course
Tokenization using nlp | NLP CourseRAKESH P
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligencekomal jain
 

Mais procurados (20)

ODSC APAC 2022 - Explainable AI
ODSC APAC 2022 - Explainable AIODSC APAC 2022 - Explainable AI
ODSC APAC 2022 - Explainable AI
 
AI and the Future.pptx
AI and the Future.pptxAI and the Future.pptx
AI and the Future.pptx
 
AI in Business: Opportunities & Challenges
AI in Business: Opportunities & ChallengesAI in Business: Opportunities & Challenges
AI in Business: Opportunities & Challenges
 
Blockchain + Big Data + AI + IoT Integration
Blockchain + Big Data + AI + IoT IntegrationBlockchain + Big Data + AI + IoT Integration
Blockchain + Big Data + AI + IoT Integration
 
AI and Data Science.pdf
AI and Data Science.pdfAI and Data Science.pdf
AI and Data Science.pdf
 
AI and machine learning
AI and machine learningAI and machine learning
AI and machine learning
 
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Application of expert system
Application of expert systemApplication of expert system
Application of expert system
 
Artificial Intelligence - What Social Work Educators Need to Know
Artificial Intelligence  - What Social Work Educators Need to KnowArtificial Intelligence  - What Social Work Educators Need to Know
Artificial Intelligence - What Social Work Educators Need to Know
 
codex.pptx
codex.pptxcodex.pptx
codex.pptx
 
Machine Learning for dummies!
Machine Learning for dummies!Machine Learning for dummies!
Machine Learning for dummies!
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
Computational intelligence in wireless sensor network
Computational intelligence in wireless sensor network Computational intelligence in wireless sensor network
Computational intelligence in wireless sensor network
 
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete DeckAI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
 
Google BARD v/s ChatGPT _ A review
Google BARD v/s ChatGPT _ A reviewGoogle BARD v/s ChatGPT _ A review
Google BARD v/s ChatGPT _ A review
 
Predictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data miningPredictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data mining
 
Low Code Neuro-Symbolic Agents.pdf
Low Code Neuro-Symbolic Agents.pdfLow Code Neuro-Symbolic Agents.pdf
Low Code Neuro-Symbolic Agents.pdf
 
Tokenization using nlp | NLP Course
Tokenization using nlp | NLP CourseTokenization using nlp | NLP Course
Tokenization using nlp | NLP Course
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 

Semelhante a Samsung voice intelligence.v5.5

The information supernova
The information supernovaThe information supernova
The information supernovaAlaa Al-Agamawi
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionZachary S. Brown
 
Amazon Web Services - Strategy and Current Offering
Amazon Web Services - Strategy and Current OfferingAmazon Web Services - Strategy and Current Offering
Amazon Web Services - Strategy and Current OfferingAmazon Web Services
 
AWS Artificial Intelligence Day - Toronto
AWS Artificial Intelligence Day - TorontoAWS Artificial Intelligence Day - Toronto
AWS Artificial Intelligence Day - TorontoAmazon Web Services
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 
Using Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, Microsoft
Using Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, MicrosoftUsing Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, Microsoft
Using Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, MicrosoftGuhan Suriyanarayanan
 
AWS re:Invent 2016: Deep Learning in Alexa (MAC202)
AWS re:Invent 2016: Deep Learning in Alexa (MAC202)AWS re:Invent 2016: Deep Learning in Alexa (MAC202)
AWS re:Invent 2016: Deep Learning in Alexa (MAC202)Amazon Web Services
 
Modeling data and best practices for the Azure Cosmos DB.
Modeling data and best practices for the Azure Cosmos DB.Modeling data and best practices for the Azure Cosmos DB.
Modeling data and best practices for the Azure Cosmos DB.Mohammad Asif
 
Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)Biswajit Biswas
 
ALX320_The Science Behind the Alexa Prize Meeting The AI Challenges
ALX320_The Science Behind the Alexa Prize Meeting The AI ChallengesALX320_The Science Behind the Alexa Prize Meeting The AI Challenges
ALX320_The Science Behind the Alexa Prize Meeting The AI ChallengesAmazon Web Services
 
Azure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in ActionAzure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in ActionDenys Chamberland
 
New Artificial Intelligence and IoT Services (Lex, Polly, Rekognition, Greeng...
New Artificial Intelligence and IoT Services (Lex, Polly, Rekognition, Greeng...New Artificial Intelligence and IoT Services (Lex, Polly, Rekognition, Greeng...
New Artificial Intelligence and IoT Services (Lex, Polly, Rekognition, Greeng...Amazon Web Services
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014Paris Open Source Summit
 
Conversational AI with Rasa - PyData Workshop
Conversational AI with Rasa - PyData WorkshopConversational AI with Rasa - PyData Workshop
Conversational AI with Rasa - PyData WorkshopTom Bocklisch
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 
Ssas dmx ile kurum içi verilerin i̇şlenmesi
Ssas dmx ile kurum içi verilerin i̇şlenmesiSsas dmx ile kurum içi verilerin i̇şlenmesi
Ssas dmx ile kurum içi verilerin i̇şlenmesiKoray Kocabas
 
EricEvans_StrategicDesign.ppt
EricEvans_StrategicDesign.pptEricEvans_StrategicDesign.ppt
EricEvans_StrategicDesign.pptNisha819927
 
Azure CosmosDb - Where we are
Azure CosmosDb - Where we areAzure CosmosDb - Where we are
Azure CosmosDb - Where we areMarco Parenzan
 
Google Cloud Platform - Cloud-Native Roadshow Stuttgart
Google Cloud Platform - Cloud-Native Roadshow StuttgartGoogle Cloud Platform - Cloud-Native Roadshow Stuttgart
Google Cloud Platform - Cloud-Native Roadshow StuttgartVMware Tanzu
 

Semelhante a Samsung voice intelligence.v5.5 (20)

The information supernova
The information supernovaThe information supernova
The information supernova
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
Amazon Web Services - Strategy and Current Offering
Amazon Web Services - Strategy and Current OfferingAmazon Web Services - Strategy and Current Offering
Amazon Web Services - Strategy and Current Offering
 
AWS Artificial Intelligence Day - Toronto
AWS Artificial Intelligence Day - TorontoAWS Artificial Intelligence Day - Toronto
AWS Artificial Intelligence Day - Toronto
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
Using Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, Microsoft
Using Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, MicrosoftUsing Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, Microsoft
Using Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, Microsoft
 
AWS re:Invent 2016: Deep Learning in Alexa (MAC202)
AWS re:Invent 2016: Deep Learning in Alexa (MAC202)AWS re:Invent 2016: Deep Learning in Alexa (MAC202)
AWS re:Invent 2016: Deep Learning in Alexa (MAC202)
 
Modeling data and best practices for the Azure Cosmos DB.
Modeling data and best practices for the Azure Cosmos DB.Modeling data and best practices for the Azure Cosmos DB.
Modeling data and best practices for the Azure Cosmos DB.
 
Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)
 
ALX320_The Science Behind the Alexa Prize Meeting The AI Challenges
ALX320_The Science Behind the Alexa Prize Meeting The AI ChallengesALX320_The Science Behind the Alexa Prize Meeting The AI Challenges
ALX320_The Science Behind the Alexa Prize Meeting The AI Challenges
 
Deep Learning Summit (DLS01-4)
Deep Learning Summit (DLS01-4)Deep Learning Summit (DLS01-4)
Deep Learning Summit (DLS01-4)
 
Azure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in ActionAzure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in Action
 
New Artificial Intelligence and IoT Services (Lex, Polly, Rekognition, Greeng...
New Artificial Intelligence and IoT Services (Lex, Polly, Rekognition, Greeng...New Artificial Intelligence and IoT Services (Lex, Polly, Rekognition, Greeng...
New Artificial Intelligence and IoT Services (Lex, Polly, Rekognition, Greeng...
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
 
Conversational AI with Rasa - PyData Workshop
Conversational AI with Rasa - PyData WorkshopConversational AI with Rasa - PyData Workshop
Conversational AI with Rasa - PyData Workshop
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Ssas dmx ile kurum içi verilerin i̇şlenmesi
Ssas dmx ile kurum içi verilerin i̇şlenmesiSsas dmx ile kurum içi verilerin i̇şlenmesi
Ssas dmx ile kurum içi verilerin i̇şlenmesi
 
EricEvans_StrategicDesign.ppt
EricEvans_StrategicDesign.pptEricEvans_StrategicDesign.ppt
EricEvans_StrategicDesign.ppt
 
Azure CosmosDb - Where we are
Azure CosmosDb - Where we areAzure CosmosDb - Where we are
Azure CosmosDb - Where we are
 
Google Cloud Platform - Cloud-Native Roadshow Stuttgart
Google Cloud Platform - Cloud-Native Roadshow StuttgartGoogle Cloud Platform - Cloud-Native Roadshow Stuttgart
Google Cloud Platform - Cloud-Native Roadshow Stuttgart
 

Último

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 

Último (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 

Samsung voice intelligence.v5.5

  • 1. Taking the Road Less Travelled: In pursuit of a Multi-modal experience for Bixby Samsung R&D Bangalore, India Dr. Vikram Vij vikram.v@samsung.com
  • 2. Intelligent Assistants are fast emerging as the next breakthrough user interface 1990s Web 2000s Apps Today Assistants Images references form
  • 3. Evolution of Human Computer Interface GUI (~1980s) Touch UI (~2000) Voice (2011) Bixby (2017) Changes of Interface Paradigm Voice Assistant Market Research Report Global Forecast 2023 Reference : https://www.marketresearchfuture.com/reports/voice-assistant-market-4003
  • 4. Bixby Introduction Bixby is an intelligent, personalized voice interface for your phone. Its multi-modal - lets you seamlessly switch between voice and touch modes. o Launch Date : 19th July 2017 (US), 22nd Aug (Global) o Available in more than 200 countries o More than 75 Domains supported (Camera, Gallery, Messages, WhatsApp, Youtube, Uber etc. ) o More than 27 million registered users http://bixby.samsung.com/meet-bixby https://www.youtube.com/watch?v=dbmVtseEjo4&index=1&list=PLrV44rSVouDcbvky1f77mUjWLCq8WI-Z1 https://www.youtube.com/watch?v=Gcd4NpK2fTI
  • 6. Bixby Overview Supporting every task of the application Understanding the current context and state of app Find an umbrella photo Manual editing VOICE TOUCH VOICE 1 2 3 Understanding commands with incomplete info Send this photo via message To whom? To Jane Done “Incomplete Command” A true one click action - Turn on - Authenticate - Unlock - Wake the phone - Execute the command Supporting Samsung’s native apps …… Request incomplete. Error “Show me the Wi-Fi data usage” Press & Hold Bixby is fundamentally different from other voice agents or assistants in the market because of its .. Post it on Instagram Completeness Context Awareness Cognitive Tolerance Frictionless
  • 7. Bixby - Cognitive Tolerance ASRIncomplete or inaccurate instructions are also performed under the context..
  • 8. Bixby | Human Computer Interface Revolution With English Support, Samsung's Bixby Impresses Vs. Siri And Google Assistant Bixby is perhaps in the most precarious spot, as it’s going to be competing directly against Google Assistant on some devices. Bixby’s capabilities sound quite impressive thanks to its integration with other Samsung apps Galaxy S8's voice sidekick can do things Siri can't
  • 9. Bixby v1.0: Minimalistic View ASR NLU voice packet text input command ASR ASR: Automatic Speech Recognition NLU: Natural Language Understanding
  • 10. Traditional NLU Flow NLU Platform mom Text to Mom Machine Learning Models Command Domain Classifier Intent Classifier Slot Tagger Messages Send Message “Mom”
  • 11. Key Challenges Design oText and Voice : Co-existence of Dual Modality oRepresentation of Massive Input Space oManagement of Massive Context oHandling of Variable Output Space oDesign of Deep Learning Architecture to Achieve this Data oManaging the distribution and variations of data oBalance of Data to maintain the expected distribution of data across different classes oSpecial handling for rejection Data
  • 12. Bixby: The Multi-Modal Point of View ① Home ② Settings ③ Connections ③ Data Usage Touch Interface Voice Interface + “Show me the mobile data usage”
  • 13. Bixby: The Multi-Modal Point of View (cont’d) Touch UI Screen Flow Voice UI “Find Hawaii photos in Gallery” Context Context Context Context “find James” in Contacts application => contact information of James “find James” in Gallery application => images tagged as James
  • 14. Leap Required for NLU toward Multi-Modality Traditional NLU Multi-Modal NLU Context Awareness Massive Number of Contexts Varying Set of Commands … … … … … Thousands of states Note8 … … … … … … S8 TabS Various device models, apps, locales, …
  • 15. Input Space = (2,000 Contexts) x (Utterances for 6,000 commands) Challenge of Massive Contextual Input Space “Find James”+ Picture View Context “Find James”+ Contact View Context James’ Picture James’ Contact … Static Classifier Static Classifier Static Classifier Static Classifier … … … … … … 6000+ command classes Context Space 2000+ contexts
  • 16. Deep Learning was chosen instead of SVMs, Random Forest etc. • Massive number of Classes • Approximately 60 Classes for Domains • Approximately 6K Classes for Intents • Closeness of Domains • The nature of classes are similar • Examples: Reminder, Calendar and Clock • Huge Data • 10M data for Domain Classification • 1.5M data per Intent Classification (on average per Domain) Motivation for Deep Learning Domain Classification Intent Classification Slot Tagger Utterance … … … … Slots Domain Label Intent Label
  • 17. Approach for Massive Contextual Input Space Context-conditioned DNN classifier + Sampling Context-Aware DNN Classifier Sampling 6000+ commands Context + Utterance context_α utterance_b  command_1+ context_α utterance_c  command_2+ … context_α utterance_a  command_1+ context_β utterance_b  command_2+ context_β utterance_c  command_2+ … context_β utterance_a  command_1+ … … … Training Set Input Output Hierarchical classifier Session based architecture Rejection Logic in Intent
  • 18. • RNN word model had difficulty in: • Handling unknowns (word misspellings) • Learning word inflections (word boundary going beyond representation) • State based learning • So switched to CNN character model Challenge of RNN vs CNN ~~~ utt ~~ ~~~ utt ~~ . . . ~~~ utt ~~ vs e.g. “search for s8 plus” goes to calculator domain e.g. Settings Bluetooth Screen : “turn off please” Issue : State is not learnt (Wifi off is detected)
  • 19. • Determining the Optimal Filter Size • Smaller filter size used for sub-word level features • Larger filter size used for understanding language structures Challenge of CNN Filter Size Multiple filters with various sizes work in parallel Final layer of CNN which gives best output Reference : hackerearth.com
  • 20. Challenge of Variable Output Space App VersionDevice Models Locale India V 1.1 … … … Turn on Bluetooth tethering Turn on USB tethering Turn on tethering Note8 … … … … … … S8 TabS Model A Model B
  • 21. Approach for Variable Output Space Version Management Mechanism for NLU Engine Note 8 Country Installed app info OS version Version Metadata … Version mask vectors V1 … … … … … … V2 V3 Device Sever Version DB NLU Core Command Classification
  • 22. Key Learnings - Design • Need to experiment with various DNN Architectures & parameters – make sure experiments have a rationale • Obvious choice of DNN may not work the best – in text RNNs typically used but CNNs proved to be better • Hierarchical design may work better (e.g. text classification) • Feature based matching for intent classes where 100% accuracy is needed • Rule-Based Matching of NER instead of ML/DL based NER • Rejection Based Intent Classification for Close Domains • Can abstract out complexity where possible (e.g. variable output space)
  • 23. Massive Data Flow Synthetic Generation of Data Purchased (3rd Party ) Data Crawled Data for Out of Domain Voice of Customer Data Quick Grammar Data DC Bucketed and annotated for Single Intent Class DC and Intent Separated by Class Levels Bucketed by Single Intent Class Special Data Market Issues & Bug Fixes for Intent and Domain Sampled 2K/Class Hand-cleanedandConsumedTotal Hand-cleaned&DownSampled Sampled2K/Class ServiceAPILayer Intent Slot Sampled 10- ~ 20K/Class Sampled 10- ~ 20K/Class Hand-cleaned & Down Sampled
  • 24. Data Governance – Training Data Used Tools to detect & resolve data conflicts across Domains & Intents • TF-IDF based tool • Cosine similarity based tool
  • 25. Data Governance – Test Data Unit Testing Automation E2E Testing Automation In- House Automated Unit Test Tool for Domain , Intent and Slot DEV Server Accepted ? Accepted ? STG Server Accepted ? PRD Server Development and Management of Data Analysis based on Data Governance Tool Y Y Y NNN End User VOC Issues
  • 26. Key Learnings - Data • Managing the distribution and variations of data is essential • Quality of Data is critical o Balance of Data to maintain the expected distribution of data across different classes o Special handling for rejection Data • A Deep Learning Engineer / Data Scientist must spend 30% of his or her time in looking at the data • People are needed to manage this volume of data • Tools / Automation need to be developed for pre-processing of data • We can not avoid hand-cleaning or hand-engineering of data • Obvious need for Data Governance as well as Continuous Monitoring of product quality. • The NLP / ML driven project cycle (including data) is quite different from conventional SW project cycle
  • 27.
  • 28. ASR: Challenge of Speech Is different for every speaker May be fast, slow, or varying in speed May have high pitch, low pitch, or be whispered Has widely-varying types of environmental noise Changes depending on sequence of phonemes Changes depending on speaking style May not have distinct boundaries between units Changes depending on the semantics of the utterance Has an unlimited number of words
  • 29. Bixby ASR - Fundamentals Language Model(s) voice packet Feature Extraction Decoder Acoustic Model(s) ASR System ASR Hypothesis Inverse Text Normalization
  • 30. • Acoustic Model • Links Acoustics to Word/phoneme sequence • Estimates the likelihood of acoustic sequence given a word/phoneme (LSTM) • Language Model • Prior on word sequences • Probability of a word given the preceding words (n-gram) • Decoder • Find the best word sequence, i.e. searching for the lowest-cost path in a graph • Uses Viterbi algorithm (dynamic programming) Bixby ASR - Fundamentals
  • 31. Bixby ASR – Multi Accent United States China India United Kingdom SpainSouth Korea DEFAULT ACCENTED On-Boarding Utterances SIM Card Information Keyboard Language Contact Details Accent Determination Based on: Australia Canada
  • 32. Challenge for Indian Market • Hindi targeted as language of experimentation. • Indian Languages e.g. Hindi is used in conjunction with English e.g. camera खुला करो • We have developed bi-lingual (English + Hindi) model for Hindi classifier