SlideShare a Scribd company logo
1 of 22
Frankbot - ML framework for auto-responding
to customer support queries
Outline of the talk
● Introduction and Problem Statement
○ Introduction to Freshdesk
○ Motivation and Objectives
● Data and Methodology
○ Data source
○ Steps in the modelling pipeline
○ Modelling Pipeline
● Additional Bot Features
○ Periodic Model Refresh and Feedback consumption
○ Teach the bot
● Results
○ Metrics and business impact
○ Understanding the metrics
○ Why are some suggestions not helpful and some queries not answered
● Challenges and learnings
Introduction and Problem Statement
Introduction to Freshdesk
Freshdesk is a multi-channel cloud based customer support product, which enables businesses to
● Streamline all customer conversations in one place - these are conversations between the business and its end
customers
● Automate repetitive work and make support agents more efficient
● Enable support agents to collaborate with other teams to resolve issues faster
● Freshdesk tickets are a record of customer conversations across channels (read phone, chat, e-mail, social, etc.)
○ A typical conversation includes customer queries and agent responses
○ Frequently recurring customer queries are called T1 tickets
● Freshdesk currently has ~150,000 customers from across the world
Some statistics from companies using Freshdesk
● Average proportion of T1 tickets - 80%
● Average proportion of tickets with answers in the knowledge base - 60%
● Average proportion of tickets with answers in the ticket conversation - 70%
Motivation and Objectives
Our motivation is to help support agents with the following
● Reduce time spent on T1 tickets by auto-resolving them, thereby enhancing their overall
productivity levels
● Reduce time spent on T2 tickets by showing similar tickets which can help in looking up useful
information that can aid in resolution
● Help in understanding the different types of questions which are raised by customers which in turn
will aid in FAQ creation
Our objectives in Frankbot development are the following
● Intercept and auto-resolve T1 tickets by leveraging content from the knowledge base
● Enable support agents to train the bot further by mapping customer queries to expected responses
Frankbot in production
Data and Methodology
Data Source for Model Training
● Source - Freshdesk data pertaining to customer (business) accounts
○ Knowledge base articles, FAQs
○ Tickets from different channels such as e-mail, portal (raised on website), chat,
social and phone
● Data of different accounts - All active and paid accounts with at least 100 tickets in the
last 3 months and 1 article in the knowledge base
● Training strategy
○ One model per account trained end-end
○ Embeddings trained at industry level, models at account level
Note: Tickets from email, portal-direct, chat and phone channels account for close to 95% of
the ticket volume
Steps in the Modelling Pipeline
L1 Embedding Layer
❖ Preprocessed text is transformed
into meaningful vectors
❖ Two vectors are generated per
query - one from LSA and the
other from word vectors trained
using FastText.
❖ Sentence vectors are obtained by
averaging the normalized vectors
of words in a sentence
❖ LSA vectors are trained per
account while word vectors are
trained per industry
Data
❖ Includes tickets from last
3 months + KB* articles
❖ Train set = tickets + KB
articles - test tickets
❖ Test set = tickets in the
last 10 days (no overlap
with train)
❖ Responses = KB articles
Preprocessing
The steps are as follows (in order):
❖ Email Cleaning - signature cleaning,
cleaning forwarded emails, removal
of code constructs, non-ascii
characters, salutation, text below
signature
❖ Primary preprocessing - unicode
normalization, lower casing,
punctuation removal, stop words
removal & stemming
❖ Secondary preprocessing - bigram
processing
*KB - Knowledge Base
Steps in the Modelling Pipeline
L2 Layer
❖ Using the L1 vectors, 3 candidate responses
with the highest cosine similarity are picked for
every query
❖ L2 layer is a classification layer with the
dependent variable as 1 if atleast one of the 3
responses is relevant to the query else 0.
❖ Labelling is done by human annotators
❖ Features for the model include % word match
between nouns/verbs/adjectives of words in the
query and candidate responses, word mover
distance and ordered bigram & trigram counts
❖ XGBoost algorithm is used for training
❖ Prediction probability from the L2 model is used
in decisioning
Response Retrieval
❖ Every query has an LSA vector and a sentence vector
aggregated from word vectors
❖ Two L1 scores are estimated for every response
based on the 2 vectors.
❖ The maximum of the 2 L1 scores is computed for
every response and 3 responses with the highest L1
score are chosen
❖ A set of 3 responses has an L2 score
❖ L2 score is used for gating
Offline Training Pipeline
Train data
Candidate
responses (n)
Test data
(m)
Preprocessing - Email cleaning, primary &
secondary preprocessing
L1 (Embedding) Layer -
training
Candidate
responses (n)
Test
vectors (m)
Pick top k responses based on
L1 scores (m*k)
Feature Creation
Preprocessing - missing value imputation,
outlier treatment, scaling
L2 (Classification)
Layer - training
Relevance Probability Vector ((m-t)*k)
Pick top 3 based on prob ((m-t)*3) +
evaluation
Train data
(t)
Candidate
responses (k)
Test data
(m-t)
Redis
S3
Write
Lookup/w
ord
vectors/idf
Write class
model object
Write L1 & L2 thresholds for gating and
ranking
Online Processing Pipeline
I/P Query
Preprocessing - Email cleaning, primary &
secondary preprocessing
L1 (Embedding) Layer -
transformation
Candidate
response
vectors (n)
Query
vector
Pick top k based on similarity
(1*k)
Feature Creation
Preprocessing - missing value imputation,
outlier treatment, scaling
L2 (Classification)
Layer - prediction
Relevance Probability Vector (1*k)
Pick top 3 based on prob (1*3)
Redis
S3
Read
Lookup/w
ord
vectors/idf
Read class
model object
Read L1 & L2 thresholds for gating and
ranking
L1 and L2 Gating Logic
❖ L1 score refers to the
maximum of the L1 scores of
the 3 candidate responses
❖ The optimal thresholds viz. L1
upper, L2 upper, L1 lower and
L2 lower are estimated by
simulating boundaries on a
sample of labelled data
Additional Bot Features
● Model refresh is key to ensuring that the models are up to date and stay relevant over
time
● This is done once a week; or as soon as an account accumulates a sizeable number of
new queries or Knowledge base updates
● It involves the following steps
○ Retraining the LSA model after including the newly accumulated data
○ Incremental training of word vectors with new data
● Retraining the L2 model on recent data
○ The L2 model is trained using feedback provided by customers as the dependent variable
Periodic model refresh and Feedback consumption
Teach the bot
● Teach the bot is a feature that allows customer support agents to explicitly train the bot by
ingesting Q → A mappings
● When the Answer bot fails to respond to a query (Q), the agent can point the bot to the expected
response (A) which should have been returned
● If a suitable response (A) does not exist in the Knowledge base, it can be created on-the-fly
● This expected response (A) is consumed and mapped to be close to the query vector (Q) in the
L1 vector space
○ This ensures that article A would show up for future queries that are similar to Q
○ The same feature is re-purposed to resolve incorrect bot responses as well
○ This feature also helps to improve the overall coverage levels of the Answer bot
Results
Metrics and business impact
Month
# Active
Clients
# Requests # Responded # Helpful # No Feedback % Deflection
May’18 97 10,805 6,075 1,657 1,868 15.34%
Jun’18 151 22,195 12,969 2,550 5,981 11.49%
July’18 182 30,376 19,330 3,792 5,669 12.48%
Aug’18 242 50,049 29,948 5,940 7,839 11.87%
Sep’18 347 63,587 38,064 8,308 10,112 13.07%
Oct’18 457 101,493 56,390 16,589 33,360 16.34%
Nov’18 478 130,687 78,902 25,680 46,555 19.65%
Dec’18 480 137,517 82,366 23,713 52,772 17.24%
● CSAT* - 79% with bots and 72% without bots
● Average First Response Time (overall) - 13 hrs with bots and 19 hrs without bots
*CSAT - Customer Satisfaction Score
Understanding the Metrics
● # Active clients - number of customers who are exposing the bot to their customers in their
support portal
● # Requests - number of requests that the bot gets
● # Responded - number of requests responded/answered by the bot
● # Helpful - number of requests where the bot responses were helpful
○ Alongside every bot response, a “Was this helpful?” message is also shown and the user’s
feedback is solicited. This helps in tracking helpful responses.
● # No Feedback - number of bot responses for which there was no feedback from users
● % Deflection - Ratio of the # Helpful and # Requests
● Query could relate to a new topic for which there may not be enough FAQs or articles
● Query could relate to an existing topic but may contain keywords which are not in the vocabulary
- This may result in low L1 and L2 confidence which may not satisfy the thresholds
● Query may be related to a particular action - Example: “Can you connect me to an agent?”
which is a question for a task completion bot that has intent detection capabilities
● Query may not have a question or issue - Example: “I have an open ticket 3335924”
● Query may be ambiguous or unclear - Example: “discussion”
Why are some suggestions not helpful and some
queries not answered
Challenges and learnings
Challenges:
● Developing a preprocessing mechanism that can extract only the salient components from
messy emails
● Handling the complexity of storing and retrieving vector of floats (idfs, SVD components, word
vectors) for every account
● Serving predictions at low latency
● Usage of the right tools for monitoring and finding bugs in the codebase in a proactive manner
Lessons Learnt:
● Involve data engineers at the very beginning
● Define success metrics and inform stakeholders about what a reasonable target is
● Define strategies for model refresh and feedback consumption
Thank You

More Related Content

What's hot

Web services for supporting the interactions of learners in the social web - ...
Web services for supporting the interactions of learners in the social web - ...Web services for supporting the interactions of learners in the social web - ...
Web services for supporting the interactions of learners in the social web - ...Traian Rebedea
 
Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...SDL
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentationSurya Sg
 
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systemsQi He
 
NoCRM - BigData Amsterdam 4.0
NoCRM - BigData Amsterdam 4.0NoCRM - BigData Amsterdam 4.0
NoCRM - BigData Amsterdam 4.0Divante
 
seq2seq learning for end-to-end dialogue systems
seq2seq learning for end-to-end dialogue systemsseq2seq learning for end-to-end dialogue systems
seq2seq learning for end-to-end dialogue systemsJordy Van Landeghem
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Sheeyam Shellvacumar
 
The Latest Advances in Patent Machine Translation
The Latest Advances in Patent Machine TranslationThe Latest Advances in Patent Machine Translation
The Latest Advances in Patent Machine TranslationIconic Translation Machines
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarSenthil Kumar M
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language   copyStatistical machine translation for indian language   copy
Statistical machine translation for indian language copyNakul Sharma
 
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...Lifeng (Aaron) Han
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.pptbutest
 
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...kevig
 
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...ijnlc
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERTAbdurrahimDerric
 
Natural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A SurveyNatural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A SurveyAkshayaNagarajan10
 

What's hot (20)

Web services for supporting the interactions of learners in the social web - ...
Web services for supporting the interactions of learners in the social web - ...Web services for supporting the interactions of learners in the social web - ...
Web services for supporting the interactions of learners in the social web - ...
 
Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
 
NoCRM - BigData Amsterdam 4.0
NoCRM - BigData Amsterdam 4.0NoCRM - BigData Amsterdam 4.0
NoCRM - BigData Amsterdam 4.0
 
Word embedding
Word embedding Word embedding
Word embedding
 
seq2seq learning for end-to-end dialogue systems
seq2seq learning for end-to-end dialogue systemsseq2seq learning for end-to-end dialogue systems
seq2seq learning for end-to-end dialogue systems
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.
 
Moses
MosesMoses
Moses
 
C aptitude book
C aptitude bookC aptitude book
C aptitude book
 
Arabic MT Project
Arabic MT ProjectArabic MT Project
Arabic MT Project
 
The Latest Advances in Patent Machine Translation
The Latest Advances in Patent Machine TranslationThe Latest Advances in Patent Machine Translation
The Latest Advances in Patent Machine Translation
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil Kumar
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language   copyStatistical machine translation for indian language   copy
Statistical machine translation for indian language copy
 
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
 
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...
 
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...
STOCKGRAM : DEEP LEARNING MODEL FOR DIGITIZING FINANCIAL COMMUNICATIONS VIA N...
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
 
Natural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A SurveyNatural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A Survey
 

Similar to ML Framework for auto-responding to customer support queries

ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesVarun Nathan
 
Artificial Intelligence at LinkedIn
Artificial Intelligence at LinkedInArtificial Intelligence at LinkedIn
Artificial Intelligence at LinkedInBill Liu
 
Filtering out improper user accounts from twitter user accounts for discoveri...
Filtering out improper user accounts from twitter user accounts for discoveri...Filtering out improper user accounts from twitter user accounts for discoveri...
Filtering out improper user accounts from twitter user accounts for discoveri...siramatu-lab
 
CV - Luthfi Mohamad Latief
CV - Luthfi Mohamad LatiefCV - Luthfi Mohamad Latief
CV - Luthfi Mohamad Latieffahriyah
 
[PythonPH] Transforming the call center with Text mining and Deep learning (C...
[PythonPH] Transforming the call center with Text mining and Deep learning (C...[PythonPH] Transforming the call center with Text mining and Deep learning (C...
[PythonPH] Transforming the call center with Text mining and Deep learning (C...Paul Lo
 
[Taipei.py] improving user experience with text mining and deep learning in Uber
[Taipei.py] improving user experience with text mining and deep learning in Uber[Taipei.py] improving user experience with text mining and deep learning in Uber
[Taipei.py] improving user experience with text mining and deep learning in UberPaul Lo
 
Concept Detection of Multiple Choice Questions using Transformer Based Models
Concept Detection of Multiple Choice Questions using Transformer Based ModelsConcept Detection of Multiple Choice Questions using Transformer Based Models
Concept Detection of Multiple Choice Questions using Transformer Based ModelsIRJET Journal
 
Student database management system
Student database management systemStudent database management system
Student database management systemSnehal Raut
 
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...Databricks
 
AI Talks Live - ML.NET and NLP (with ONNX)
AI Talks Live - ML.NET and NLP (with ONNX)AI Talks Live - ML.NET and NLP (with ONNX)
AI Talks Live - ML.NET and NLP (with ONNX)Mauro Bennici
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningSanghamitra Deb
 
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...Lviv Startup Club
 
ClassifyingIssuesFromSRTextAzureML
ClassifyingIssuesFromSRTextAzureMLClassifyingIssuesFromSRTextAzureML
ClassifyingIssuesFromSRTextAzureMLGeorge Simov
 
A WEB BASED APPLICATION FOR RESUME PARSER USING NATURAL LANGUAGE PROCESSING T...
A WEB BASED APPLICATION FOR RESUME PARSER USING NATURAL LANGUAGE PROCESSING T...A WEB BASED APPLICATION FOR RESUME PARSER USING NATURAL LANGUAGE PROCESSING T...
A WEB BASED APPLICATION FOR RESUME PARSER USING NATURAL LANGUAGE PROCESSING T...IRJET Journal
 
DWaltherGraniteStateUG 2023.pptx
DWaltherGraniteStateUG 2023.pptxDWaltherGraniteStateUG 2023.pptx
DWaltherGraniteStateUG 2023.pptxDeb Walther
 
Predicting Tweet Sentiment
Predicting Tweet SentimentPredicting Tweet Sentiment
Predicting Tweet SentimentLucinda Linde
 
Shruthi_Resume-2.7 Years Experience
Shruthi_Resume-2.7 Years ExperienceShruthi_Resume-2.7 Years Experience
Shruthi_Resume-2.7 Years ExperienceShruthi E.N.
 

Similar to ML Framework for auto-responding to customer support queries (20)

ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queries
 
Artificial Intelligence at LinkedIn
Artificial Intelligence at LinkedInArtificial Intelligence at LinkedIn
Artificial Intelligence at LinkedIn
 
Filtering out improper user accounts from twitter user accounts for discoveri...
Filtering out improper user accounts from twitter user accounts for discoveri...Filtering out improper user accounts from twitter user accounts for discoveri...
Filtering out improper user accounts from twitter user accounts for discoveri...
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
 
CV - Luthfi Mohamad Latief
CV - Luthfi Mohamad LatiefCV - Luthfi Mohamad Latief
CV - Luthfi Mohamad Latief
 
[PythonPH] Transforming the call center with Text mining and Deep learning (C...
[PythonPH] Transforming the call center with Text mining and Deep learning (C...[PythonPH] Transforming the call center with Text mining and Deep learning (C...
[PythonPH] Transforming the call center with Text mining and Deep learning (C...
 
[Taipei.py] improving user experience with text mining and deep learning in Uber
[Taipei.py] improving user experience with text mining and deep learning in Uber[Taipei.py] improving user experience with text mining and deep learning in Uber
[Taipei.py] improving user experience with text mining and deep learning in Uber
 
Concept Detection of Multiple Choice Questions using Transformer Based Models
Concept Detection of Multiple Choice Questions using Transformer Based ModelsConcept Detection of Multiple Choice Questions using Transformer Based Models
Concept Detection of Multiple Choice Questions using Transformer Based Models
 
Student database management system
Student database management systemStudent database management system
Student database management system
 
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
 
AI Talks Live - ML.NET and NLP (with ONNX)
AI Talks Live - ML.NET and NLP (with ONNX)AI Talks Live - ML.NET and NLP (with ONNX)
AI Talks Live - ML.NET and NLP (with ONNX)
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learning
 
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
 
ClassifyingIssuesFromSRTextAzureML
ClassifyingIssuesFromSRTextAzureMLClassifyingIssuesFromSRTextAzureML
ClassifyingIssuesFromSRTextAzureML
 
Requirements Analysis
Requirements AnalysisRequirements Analysis
Requirements Analysis
 
Anjan Bhowmik
Anjan BhowmikAnjan Bhowmik
Anjan Bhowmik
 
A WEB BASED APPLICATION FOR RESUME PARSER USING NATURAL LANGUAGE PROCESSING T...
A WEB BASED APPLICATION FOR RESUME PARSER USING NATURAL LANGUAGE PROCESSING T...A WEB BASED APPLICATION FOR RESUME PARSER USING NATURAL LANGUAGE PROCESSING T...
A WEB BASED APPLICATION FOR RESUME PARSER USING NATURAL LANGUAGE PROCESSING T...
 
DWaltherGraniteStateUG 2023.pptx
DWaltherGraniteStateUG 2023.pptxDWaltherGraniteStateUG 2023.pptx
DWaltherGraniteStateUG 2023.pptx
 
Predicting Tweet Sentiment
Predicting Tweet SentimentPredicting Tweet Sentiment
Predicting Tweet Sentiment
 
Shruthi_Resume-2.7 Years Experience
Shruthi_Resume-2.7 Years ExperienceShruthi_Resume-2.7 Years Experience
Shruthi_Resume-2.7 Years Experience
 

Recently uploaded

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 

Recently uploaded (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 

ML Framework for auto-responding to customer support queries

  • 1. Frankbot - ML framework for auto-responding to customer support queries
  • 2. Outline of the talk ● Introduction and Problem Statement ○ Introduction to Freshdesk ○ Motivation and Objectives ● Data and Methodology ○ Data source ○ Steps in the modelling pipeline ○ Modelling Pipeline ● Additional Bot Features ○ Periodic Model Refresh and Feedback consumption ○ Teach the bot ● Results ○ Metrics and business impact ○ Understanding the metrics ○ Why are some suggestions not helpful and some queries not answered ● Challenges and learnings
  • 4. Introduction to Freshdesk Freshdesk is a multi-channel cloud based customer support product, which enables businesses to ● Streamline all customer conversations in one place - these are conversations between the business and its end customers ● Automate repetitive work and make support agents more efficient ● Enable support agents to collaborate with other teams to resolve issues faster ● Freshdesk tickets are a record of customer conversations across channels (read phone, chat, e-mail, social, etc.) ○ A typical conversation includes customer queries and agent responses ○ Frequently recurring customer queries are called T1 tickets ● Freshdesk currently has ~150,000 customers from across the world Some statistics from companies using Freshdesk ● Average proportion of T1 tickets - 80% ● Average proportion of tickets with answers in the knowledge base - 60% ● Average proportion of tickets with answers in the ticket conversation - 70%
  • 5. Motivation and Objectives Our motivation is to help support agents with the following ● Reduce time spent on T1 tickets by auto-resolving them, thereby enhancing their overall productivity levels ● Reduce time spent on T2 tickets by showing similar tickets which can help in looking up useful information that can aid in resolution ● Help in understanding the different types of questions which are raised by customers which in turn will aid in FAQ creation Our objectives in Frankbot development are the following ● Intercept and auto-resolve T1 tickets by leveraging content from the knowledge base ● Enable support agents to train the bot further by mapping customer queries to expected responses
  • 8. Data Source for Model Training ● Source - Freshdesk data pertaining to customer (business) accounts ○ Knowledge base articles, FAQs ○ Tickets from different channels such as e-mail, portal (raised on website), chat, social and phone ● Data of different accounts - All active and paid accounts with at least 100 tickets in the last 3 months and 1 article in the knowledge base ● Training strategy ○ One model per account trained end-end ○ Embeddings trained at industry level, models at account level Note: Tickets from email, portal-direct, chat and phone channels account for close to 95% of the ticket volume
  • 9. Steps in the Modelling Pipeline L1 Embedding Layer ❖ Preprocessed text is transformed into meaningful vectors ❖ Two vectors are generated per query - one from LSA and the other from word vectors trained using FastText. ❖ Sentence vectors are obtained by averaging the normalized vectors of words in a sentence ❖ LSA vectors are trained per account while word vectors are trained per industry Data ❖ Includes tickets from last 3 months + KB* articles ❖ Train set = tickets + KB articles - test tickets ❖ Test set = tickets in the last 10 days (no overlap with train) ❖ Responses = KB articles Preprocessing The steps are as follows (in order): ❖ Email Cleaning - signature cleaning, cleaning forwarded emails, removal of code constructs, non-ascii characters, salutation, text below signature ❖ Primary preprocessing - unicode normalization, lower casing, punctuation removal, stop words removal & stemming ❖ Secondary preprocessing - bigram processing *KB - Knowledge Base
  • 10. Steps in the Modelling Pipeline L2 Layer ❖ Using the L1 vectors, 3 candidate responses with the highest cosine similarity are picked for every query ❖ L2 layer is a classification layer with the dependent variable as 1 if atleast one of the 3 responses is relevant to the query else 0. ❖ Labelling is done by human annotators ❖ Features for the model include % word match between nouns/verbs/adjectives of words in the query and candidate responses, word mover distance and ordered bigram & trigram counts ❖ XGBoost algorithm is used for training ❖ Prediction probability from the L2 model is used in decisioning Response Retrieval ❖ Every query has an LSA vector and a sentence vector aggregated from word vectors ❖ Two L1 scores are estimated for every response based on the 2 vectors. ❖ The maximum of the 2 L1 scores is computed for every response and 3 responses with the highest L1 score are chosen ❖ A set of 3 responses has an L2 score ❖ L2 score is used for gating
  • 11. Offline Training Pipeline Train data Candidate responses (n) Test data (m) Preprocessing - Email cleaning, primary & secondary preprocessing L1 (Embedding) Layer - training Candidate responses (n) Test vectors (m) Pick top k responses based on L1 scores (m*k) Feature Creation Preprocessing - missing value imputation, outlier treatment, scaling L2 (Classification) Layer - training Relevance Probability Vector ((m-t)*k) Pick top 3 based on prob ((m-t)*3) + evaluation Train data (t) Candidate responses (k) Test data (m-t) Redis S3 Write Lookup/w ord vectors/idf Write class model object Write L1 & L2 thresholds for gating and ranking
  • 12. Online Processing Pipeline I/P Query Preprocessing - Email cleaning, primary & secondary preprocessing L1 (Embedding) Layer - transformation Candidate response vectors (n) Query vector Pick top k based on similarity (1*k) Feature Creation Preprocessing - missing value imputation, outlier treatment, scaling L2 (Classification) Layer - prediction Relevance Probability Vector (1*k) Pick top 3 based on prob (1*3) Redis S3 Read Lookup/w ord vectors/idf Read class model object Read L1 & L2 thresholds for gating and ranking
  • 13. L1 and L2 Gating Logic ❖ L1 score refers to the maximum of the L1 scores of the 3 candidate responses ❖ The optimal thresholds viz. L1 upper, L2 upper, L1 lower and L2 lower are estimated by simulating boundaries on a sample of labelled data
  • 15. ● Model refresh is key to ensuring that the models are up to date and stay relevant over time ● This is done once a week; or as soon as an account accumulates a sizeable number of new queries or Knowledge base updates ● It involves the following steps ○ Retraining the LSA model after including the newly accumulated data ○ Incremental training of word vectors with new data ● Retraining the L2 model on recent data ○ The L2 model is trained using feedback provided by customers as the dependent variable Periodic model refresh and Feedback consumption
  • 16. Teach the bot ● Teach the bot is a feature that allows customer support agents to explicitly train the bot by ingesting Q → A mappings ● When the Answer bot fails to respond to a query (Q), the agent can point the bot to the expected response (A) which should have been returned ● If a suitable response (A) does not exist in the Knowledge base, it can be created on-the-fly ● This expected response (A) is consumed and mapped to be close to the query vector (Q) in the L1 vector space ○ This ensures that article A would show up for future queries that are similar to Q ○ The same feature is re-purposed to resolve incorrect bot responses as well ○ This feature also helps to improve the overall coverage levels of the Answer bot
  • 18. Metrics and business impact Month # Active Clients # Requests # Responded # Helpful # No Feedback % Deflection May’18 97 10,805 6,075 1,657 1,868 15.34% Jun’18 151 22,195 12,969 2,550 5,981 11.49% July’18 182 30,376 19,330 3,792 5,669 12.48% Aug’18 242 50,049 29,948 5,940 7,839 11.87% Sep’18 347 63,587 38,064 8,308 10,112 13.07% Oct’18 457 101,493 56,390 16,589 33,360 16.34% Nov’18 478 130,687 78,902 25,680 46,555 19.65% Dec’18 480 137,517 82,366 23,713 52,772 17.24% ● CSAT* - 79% with bots and 72% without bots ● Average First Response Time (overall) - 13 hrs with bots and 19 hrs without bots *CSAT - Customer Satisfaction Score
  • 19. Understanding the Metrics ● # Active clients - number of customers who are exposing the bot to their customers in their support portal ● # Requests - number of requests that the bot gets ● # Responded - number of requests responded/answered by the bot ● # Helpful - number of requests where the bot responses were helpful ○ Alongside every bot response, a “Was this helpful?” message is also shown and the user’s feedback is solicited. This helps in tracking helpful responses. ● # No Feedback - number of bot responses for which there was no feedback from users ● % Deflection - Ratio of the # Helpful and # Requests
  • 20. ● Query could relate to a new topic for which there may not be enough FAQs or articles ● Query could relate to an existing topic but may contain keywords which are not in the vocabulary - This may result in low L1 and L2 confidence which may not satisfy the thresholds ● Query may be related to a particular action - Example: “Can you connect me to an agent?” which is a question for a task completion bot that has intent detection capabilities ● Query may not have a question or issue - Example: “I have an open ticket 3335924” ● Query may be ambiguous or unclear - Example: “discussion” Why are some suggestions not helpful and some queries not answered
  • 21. Challenges and learnings Challenges: ● Developing a preprocessing mechanism that can extract only the salient components from messy emails ● Handling the complexity of storing and retrieving vector of floats (idfs, SVD components, word vectors) for every account ● Serving predictions at low latency ● Usage of the right tools for monitoring and finding bugs in the codebase in a proactive manner Lessons Learnt: ● Involve data engineers at the very beginning ● Define success metrics and inform stakeholders about what a reasonable target is ● Define strategies for model refresh and feedback consumption