The synopsis of this presentation is about how ML can be employed to develop a bot that has the capability to understand natural language and provide suitable response.
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
ML Framework for auto-responding to customer support queries
1. Frankbot - ML framework for auto-responding
to customer support queries
2. Outline of the talk
● Introduction and Problem Statement
○ Introduction to Freshdesk
○ Motivation and Objectives
● Data and Methodology
○ Data source
○ Steps in the modelling pipeline
○ Modelling Pipeline
● Additional Bot Features
○ Periodic Model Refresh and Feedback consumption
○ Teach the bot
● Results
○ Metrics and business impact
○ Understanding the metrics
○ Why are some suggestions not helpful and some queries not answered
● Challenges and learnings
4. Introduction to Freshdesk
Freshdesk is a multi-channel cloud based customer support product, which enables businesses to
● Streamline all customer conversations in one place - these are conversations between the business and its end
customers
● Automate repetitive work and make support agents more efficient
● Enable support agents to collaborate with other teams to resolve issues faster
● Freshdesk tickets are a record of customer conversations across channels (read phone, chat, e-mail, social, etc.)
○ A typical conversation includes customer queries and agent responses
○ Frequently recurring customer queries are called T1 tickets
● Freshdesk currently has ~150,000 customers from across the world
Some statistics from companies using Freshdesk
● Average proportion of T1 tickets - 80%
● Average proportion of tickets with answers in the knowledge base - 60%
● Average proportion of tickets with answers in the ticket conversation - 70%
5. Motivation and Objectives
Our motivation is to help support agents with the following
● Reduce time spent on T1 tickets by auto-resolving them, thereby enhancing their overall
productivity levels
● Reduce time spent on T2 tickets by showing similar tickets which can help in looking up useful
information that can aid in resolution
● Help in understanding the different types of questions which are raised by customers which in turn
will aid in FAQ creation
Our objectives in Frankbot development are the following
● Intercept and auto-resolve T1 tickets by leveraging content from the knowledge base
● Enable support agents to train the bot further by mapping customer queries to expected responses
8. Data Source for Model Training
● Source - Freshdesk data pertaining to customer (business) accounts
○ Knowledge base articles, FAQs
○ Tickets from different channels such as e-mail, portal (raised on website), chat,
social and phone
● Data of different accounts - All active and paid accounts with at least 100 tickets in the
last 3 months and 1 article in the knowledge base
● Training strategy
○ One model per account trained end-end
○ Embeddings trained at industry level, models at account level
Note: Tickets from email, portal-direct, chat and phone channels account for close to 95% of
the ticket volume
9. Steps in the Modelling Pipeline
L1 Embedding Layer
❖ Preprocessed text is transformed
into meaningful vectors
❖ Two vectors are generated per
query - one from LSA and the
other from word vectors trained
using FastText.
❖ Sentence vectors are obtained by
averaging the normalized vectors
of words in a sentence
❖ LSA vectors are trained per
account while word vectors are
trained per industry
Data
❖ Includes tickets from last
3 months + KB* articles
❖ Train set = tickets + KB
articles - test tickets
❖ Test set = tickets in the
last 10 days (no overlap
with train)
❖ Responses = KB articles
Preprocessing
The steps are as follows (in order):
❖ Email Cleaning - signature cleaning,
cleaning forwarded emails, removal
of code constructs, non-ascii
characters, salutation, text below
signature
❖ Primary preprocessing - unicode
normalization, lower casing,
punctuation removal, stop words
removal & stemming
❖ Secondary preprocessing - bigram
processing
*KB - Knowledge Base
10. Steps in the Modelling Pipeline
L2 Layer
❖ Using the L1 vectors, 3 candidate responses
with the highest cosine similarity are picked for
every query
❖ L2 layer is a classification layer with the
dependent variable as 1 if atleast one of the 3
responses is relevant to the query else 0.
❖ Labelling is done by human annotators
❖ Features for the model include % word match
between nouns/verbs/adjectives of words in the
query and candidate responses, word mover
distance and ordered bigram & trigram counts
❖ XGBoost algorithm is used for training
❖ Prediction probability from the L2 model is used
in decisioning
Response Retrieval
❖ Every query has an LSA vector and a sentence vector
aggregated from word vectors
❖ Two L1 scores are estimated for every response
based on the 2 vectors.
❖ The maximum of the 2 L1 scores is computed for
every response and 3 responses with the highest L1
score are chosen
❖ A set of 3 responses has an L2 score
❖ L2 score is used for gating
11. Offline Training Pipeline
Train data
Candidate
responses (n)
Test data
(m)
Preprocessing - Email cleaning, primary &
secondary preprocessing
L1 (Embedding) Layer -
training
Candidate
responses (n)
Test
vectors (m)
Pick top k responses based on
L1 scores (m*k)
Feature Creation
Preprocessing - missing value imputation,
outlier treatment, scaling
L2 (Classification)
Layer - training
Relevance Probability Vector ((m-t)*k)
Pick top 3 based on prob ((m-t)*3) +
evaluation
Train data
(t)
Candidate
responses (k)
Test data
(m-t)
Redis
S3
Write
Lookup/w
ord
vectors/idf
Write class
model object
Write L1 & L2 thresholds for gating and
ranking
12. Online Processing Pipeline
I/P Query
Preprocessing - Email cleaning, primary &
secondary preprocessing
L1 (Embedding) Layer -
transformation
Candidate
response
vectors (n)
Query
vector
Pick top k based on similarity
(1*k)
Feature Creation
Preprocessing - missing value imputation,
outlier treatment, scaling
L2 (Classification)
Layer - prediction
Relevance Probability Vector (1*k)
Pick top 3 based on prob (1*3)
Redis
S3
Read
Lookup/w
ord
vectors/idf
Read class
model object
Read L1 & L2 thresholds for gating and
ranking
13. L1 and L2 Gating Logic
❖ L1 score refers to the
maximum of the L1 scores of
the 3 candidate responses
❖ The optimal thresholds viz. L1
upper, L2 upper, L1 lower and
L2 lower are estimated by
simulating boundaries on a
sample of labelled data
15. ● Model refresh is key to ensuring that the models are up to date and stay relevant over
time
● This is done once a week; or as soon as an account accumulates a sizeable number of
new queries or Knowledge base updates
● It involves the following steps
○ Retraining the LSA model after including the newly accumulated data
○ Incremental training of word vectors with new data
● Retraining the L2 model on recent data
○ The L2 model is trained using feedback provided by customers as the dependent variable
Periodic model refresh and Feedback consumption
16. Teach the bot
● Teach the bot is a feature that allows customer support agents to explicitly train the bot by
ingesting Q → A mappings
● When the Answer bot fails to respond to a query (Q), the agent can point the bot to the expected
response (A) which should have been returned
● If a suitable response (A) does not exist in the Knowledge base, it can be created on-the-fly
● This expected response (A) is consumed and mapped to be close to the query vector (Q) in the
L1 vector space
○ This ensures that article A would show up for future queries that are similar to Q
○ The same feature is re-purposed to resolve incorrect bot responses as well
○ This feature also helps to improve the overall coverage levels of the Answer bot
18. Metrics and business impact
Month
# Active
Clients
# Requests # Responded # Helpful # No Feedback % Deflection
May’18 97 10,805 6,075 1,657 1,868 15.34%
Jun’18 151 22,195 12,969 2,550 5,981 11.49%
July’18 182 30,376 19,330 3,792 5,669 12.48%
Aug’18 242 50,049 29,948 5,940 7,839 11.87%
Sep’18 347 63,587 38,064 8,308 10,112 13.07%
Oct’18 457 101,493 56,390 16,589 33,360 16.34%
Nov’18 478 130,687 78,902 25,680 46,555 19.65%
Dec’18 480 137,517 82,366 23,713 52,772 17.24%
● CSAT* - 79% with bots and 72% without bots
● Average First Response Time (overall) - 13 hrs with bots and 19 hrs without bots
*CSAT - Customer Satisfaction Score
19. Understanding the Metrics
● # Active clients - number of customers who are exposing the bot to their customers in their
support portal
● # Requests - number of requests that the bot gets
● # Responded - number of requests responded/answered by the bot
● # Helpful - number of requests where the bot responses were helpful
○ Alongside every bot response, a “Was this helpful?” message is also shown and the user’s
feedback is solicited. This helps in tracking helpful responses.
● # No Feedback - number of bot responses for which there was no feedback from users
● % Deflection - Ratio of the # Helpful and # Requests
20. ● Query could relate to a new topic for which there may not be enough FAQs or articles
● Query could relate to an existing topic but may contain keywords which are not in the vocabulary
- This may result in low L1 and L2 confidence which may not satisfy the thresholds
● Query may be related to a particular action - Example: “Can you connect me to an agent?”
which is a question for a task completion bot that has intent detection capabilities
● Query may not have a question or issue - Example: “I have an open ticket 3335924”
● Query may be ambiguous or unclear - Example: “discussion”
Why are some suggestions not helpful and some
queries not answered
21. Challenges and learnings
Challenges:
● Developing a preprocessing mechanism that can extract only the salient components from
messy emails
● Handling the complexity of storing and retrieving vector of floats (idfs, SVD components, word
vectors) for every account
● Serving predictions at low latency
● Usage of the right tools for monitoring and finding bugs in the codebase in a proactive manner
Lessons Learnt:
● Involve data engineers at the very beginning
● Define success metrics and inform stakeholders about what a reasonable target is
● Define strategies for model refresh and feedback consumption