SlideShare a Scribd company logo
1 of 14
Download to read offline
Yellow Belt
Case study
Amaury Beeckman, Machine learning Engineer at Sagacify
28 May 2019
Automatic Claim Email
Classification
We are
Sagacify
• Experts in Artificial Intelligence
• Natural Language Processing
• Computer vision
• Predictive models
• Experts in Software Development
• Web & Mobile
• R&D oriented
• Strong collaboration with
Universities
• Focused on moonshot ideas!
Project’s scope
Copyright Sagacify SPRL, Confidential – Do not share
Automatic claim email classification in the insurance business
1. Incoming emails Categories
Category 1
Category 2
Category 3
…
2. Read emails content
The model has learned its own set of
rules that associates the text of an
email to a label
3. Learned model predicts labels
ML Model
Context of the project
Copyright Sagacify SPRL, Confidential – Do not share
Main business problem
5
Too many categories
About a thousand !
Become difficult for the business
Too many possibilities to memorize
Will be complex for the ML model
There are many subtleties that the model will need to understand
Copyright Sagacify SPRL, Confidential – Do not share
Answer: Clustering
6
Group closely related categories together
From 1000’s to less than 100’s
Allow new set of labels
Closely related to business process
Complexity reduction for the ML model
Fewer labels that makes more sense
Copyright Sagacify SPRL, Confidential – Do not share
What about Clustering
7
Machine learning algorithm
◼ Groups entries that are closely related
◼ Uses the mean euclidean distance as metric
◼ https://www.naftaliharris.com/blog/visualizing-k-means
-clustering/
Copyright Sagacify SPRL, Confidential – Do not share
What about the dataset
8
◼ One row represents one email
◼ One column represents one class
◼ We have ~25 000 mails and 339 classes
◼ One cell corresponds to the probability of a mail being
in a particular class
It’s time for a Jupyter notebook
yellow_case_study.ipynb
Whole process is
more complex
Copyright Sagacify SPRL, Confidential – Do not share
First Step: Deep-Learning
11
Categories
Probas of category 1
Probas of category 2
Probas of category 3
…
Text input
The model has learned its own set of
rules that associates the text of an email
to a label
Deep-Learning model
Copyright Sagacify SPRL, Confidential – Do not share
Second step: Clustering algorithms
12
◼ Same idea as what we already done.
◼ Start with output probabilities of our Deep-Learning model
◼ Cluster the emails in different groups
◼ Use Graph theory to link closely related classes together
Copyright Sagacify SPRL, Confidential – Do not share
Third step: Validation with business
13
◼ The results must be validated by the business
◼ We had several focus sessions to derive the ideal labellisation
○ That perfectly underlies the process of the company
○ That make sense algorithmically for our models.
“Just like electricity did 100 years ago, artificial
intelligence will revolutionize all industry”
“The value of AI is not to be found in the models
themselves, but in organizations abilities to harness
them “
– Andrew Ng
– McKinsey Global Institute – April 2018

More Related Content

What's hot

What's hot (20)

How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Data Driven Engineering 2014
Data Driven Engineering 2014Data Driven Engineering 2014
Data Driven Engineering 2014
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Barga Data Science lecture 1
Barga Data Science lecture 1Barga Data Science lecture 1
Barga Data Science lecture 1
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
H2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientistsH2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientists
 
Data Analytics: From Basic Skills to Executive Decision-Making
Data Analytics: From Basic Skills to Executive Decision-MakingData Analytics: From Basic Skills to Executive Decision-Making
Data Analytics: From Basic Skills to Executive Decision-Making
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
 
A Hybrid Approach to Data Science Project Management
A Hybrid Approach to Data Science Project ManagementA Hybrid Approach to Data Science Project Management
A Hybrid Approach to Data Science Project Management
 
Data science - An Introduction
Data science - An IntroductionData science - An Introduction
Data science - An Introduction
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Data Analytics Career Paths
Data Analytics Career PathsData Analytics Career Paths
Data Analytics Career Paths
 
Machine Learning and Blockchain by Director of Product at Target
Machine Learning and Blockchain by Director of Product at TargetMachine Learning and Blockchain by Director of Product at Target
Machine Learning and Blockchain by Director of Product at Target
 
High Accuracy Model at what costs - Data Curry
High Accuracy Model at what costs - Data Curry High Accuracy Model at what costs - Data Curry
High Accuracy Model at what costs - Data Curry
 
data scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st centurydata scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st century
 
Data Science-Why?What?How? By Hari Prasad
Data Science-Why?What?How? By Hari PrasadData Science-Why?What?How? By Hari Prasad
Data Science-Why?What?How? By Hari Prasad
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
 
BigMLSchool: Trustworthy AI
BigMLSchool: Trustworthy AIBigMLSchool: Trustworthy AI
BigMLSchool: Trustworthy AI
 
Data Scientist Job, Career & Salary | Data Scientist Salary | Data Science Ma...
Data Scientist Job, Career & Salary | Data Scientist Salary | Data Science Ma...Data Scientist Job, Career & Salary | Data Scientist Salary | Data Science Ma...
Data Scientist Job, Career & Salary | Data Scientist Salary | Data Science Ma...
 

Similar to AI Yellow Belt - Day 1 - case by Sagacify

Similar to AI Yellow Belt - Day 1 - case by Sagacify (20)

Sumit kumar
Sumit kumarSumit kumar
Sumit kumar
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
10 Limitations of Large Language Models and Mitigation Options
10 Limitations of Large Language Models and Mitigation Options10 Limitations of Large Language Models and Mitigation Options
10 Limitations of Large Language Models and Mitigation Options
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning - A Simplified view
Machine Learning - A Simplified viewMachine Learning - A Simplified view
Machine Learning - A Simplified view
 
Machine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLMachine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXL
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Explore ML with Crowdsource | ML Extended - Session 4
Explore ML with Crowdsource | ML Extended - Session 4Explore ML with Crowdsource | ML Extended - Session 4
Explore ML with Crowdsource | ML Extended - Session 4
 
Seminar(Pattern Recognition)
Seminar(Pattern Recognition)Seminar(Pattern Recognition)
Seminar(Pattern Recognition)
 
Future-Proof Your L&D With New Tech and Gamification
Future-Proof Your L&D With New Tech and GamificationFuture-Proof Your L&D With New Tech and Gamification
Future-Proof Your L&D With New Tech and Gamification
 
ML Session-2
ML Session-2ML Session-2
ML Session-2
 
Essay Ideas On To Kill A Mockingbird. Online assignment writing service.
Essay Ideas On To Kill A Mockingbird. Online assignment writing service.Essay Ideas On To Kill A Mockingbird. Online assignment writing service.
Essay Ideas On To Kill A Mockingbird. Online assignment writing service.
 
Solve complex business problems with managed ML services.pdf
Solve complex business problems with managed ML services.pdfSolve complex business problems with managed ML services.pdf
Solve complex business problems with managed ML services.pdf
 
Solve complex business problems with managed ML services.pdf
Solve complex business problems with managed ML services.pdfSolve complex business problems with managed ML services.pdf
Solve complex business problems with managed ML services.pdf
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
 
Wecp all-india-test-series-program-brochure
Wecp all-india-test-series-program-brochureWecp all-india-test-series-program-brochure
Wecp all-india-test-series-program-brochure
 
Wecp all-india-test-series-program-brochure
Wecp all-india-test-series-program-brochureWecp all-india-test-series-program-brochure
Wecp all-india-test-series-program-brochure
 
[DSC Europe 23] Marcel Tkacik - Augmented Retrieval Products with GAI models
[DSC Europe 23] Marcel Tkacik - Augmented Retrieval Products with GAI models[DSC Europe 23] Marcel Tkacik - Augmented Retrieval Products with GAI models
[DSC Europe 23] Marcel Tkacik - Augmented Retrieval Products with GAI models
 
Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18
 
Using weak supervision and transfer learning techniques to build knowledge gr...
Using weak supervision and transfer learning techniques to build knowledge gr...Using weak supervision and transfer learning techniques to build knowledge gr...
Using weak supervision and transfer learning techniques to build knowledge gr...
 

Recently uploaded

怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 

Recently uploaded (20)

怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 

AI Yellow Belt - Day 1 - case by Sagacify

  • 1. Yellow Belt Case study Amaury Beeckman, Machine learning Engineer at Sagacify 28 May 2019 Automatic Claim Email Classification
  • 2. We are Sagacify • Experts in Artificial Intelligence • Natural Language Processing • Computer vision • Predictive models • Experts in Software Development • Web & Mobile • R&D oriented • Strong collaboration with Universities • Focused on moonshot ideas!
  • 4. Copyright Sagacify SPRL, Confidential – Do not share Automatic claim email classification in the insurance business 1. Incoming emails Categories Category 1 Category 2 Category 3 … 2. Read emails content The model has learned its own set of rules that associates the text of an email to a label 3. Learned model predicts labels ML Model Context of the project
  • 5. Copyright Sagacify SPRL, Confidential – Do not share Main business problem 5 Too many categories About a thousand ! Become difficult for the business Too many possibilities to memorize Will be complex for the ML model There are many subtleties that the model will need to understand
  • 6. Copyright Sagacify SPRL, Confidential – Do not share Answer: Clustering 6 Group closely related categories together From 1000’s to less than 100’s Allow new set of labels Closely related to business process Complexity reduction for the ML model Fewer labels that makes more sense
  • 7. Copyright Sagacify SPRL, Confidential – Do not share What about Clustering 7 Machine learning algorithm ◼ Groups entries that are closely related ◼ Uses the mean euclidean distance as metric ◼ https://www.naftaliharris.com/blog/visualizing-k-means -clustering/
  • 8. Copyright Sagacify SPRL, Confidential – Do not share What about the dataset 8 ◼ One row represents one email ◼ One column represents one class ◼ We have ~25 000 mails and 339 classes ◼ One cell corresponds to the probability of a mail being in a particular class
  • 9. It’s time for a Jupyter notebook yellow_case_study.ipynb
  • 11. Copyright Sagacify SPRL, Confidential – Do not share First Step: Deep-Learning 11 Categories Probas of category 1 Probas of category 2 Probas of category 3 … Text input The model has learned its own set of rules that associates the text of an email to a label Deep-Learning model
  • 12. Copyright Sagacify SPRL, Confidential – Do not share Second step: Clustering algorithms 12 ◼ Same idea as what we already done. ◼ Start with output probabilities of our Deep-Learning model ◼ Cluster the emails in different groups ◼ Use Graph theory to link closely related classes together
  • 13. Copyright Sagacify SPRL, Confidential – Do not share Third step: Validation with business 13 ◼ The results must be validated by the business ◼ We had several focus sessions to derive the ideal labellisation ○ That perfectly underlies the process of the company ○ That make sense algorithmically for our models.
  • 14. “Just like electricity did 100 years ago, artificial intelligence will revolutionize all industry” “The value of AI is not to be found in the models themselves, but in organizations abilities to harness them “ – Andrew Ng – McKinsey Global Institute – April 2018