SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
1. Why Data Science?
2. Personal Experience
a. Evolution
b. Machine Learning Challenges
3. Business Process Automation
a. RPA
b. Chatbots
c. Data Analytics
d. Data Science
4. Demo
a. Multiple Text Classification
b. Deep Learning vs Standard NLP approach
5. Summary
AGENDA
WHY DATA SCIENCE ?
Data Science will be future and I can work
on the interesting projects for the next years
PERSONAL EVOLUTION
★ Data Sets
★ Contest
★ Codes
Kaggle - Data Science as a Competition
★ Self-learning
★ Recruitment
★ Algorithms for Companies
BUSINESS PROCESS AUTOMATION
★ Automation of the business processes within organization
○ minimize costs, time and human errors
○ existing toolkits or open source to implement scripts
RPA (Robotics Process Automation)
Steps to implement RPA
I. Find and describe business process
II. Estimate complexity and cost saving
III. Implementation, deployment and monitoring
robonomika.pl
www.economist.com/business/2018/09/15/ai-may-not-be-bad-news-for-workers
RPA - use cases, a little brainstorm
blog.appliedai.com/robotic-process-automation-use-cases
Possible areas
➢ Customer Service
➢ Data Management
➢ Financial Services
➢ Human Resources
➢ Operations
➢ Procurement
➢ Reporting
➢ Sales
Brief History
1966
NLP Computer program ELIZA, @MIT AI Lab, Weizenbaum
CHATBOTS
Automatic conversation with Customers (Virtual Agent)
2018
Real Conversation via Phone Call and IOT Devices
★ TTS (Text to Speech)
★ SR (Speech Recognition Engine)
AI Assistant calls local business
youtube.com/watch?v=D5VN56jQMWM
Command based and structured
structured questions and answers
AI based
based on Machine Learning
CHATBOTS
Cons
- not every business
- limitted audience
- retention
- mistakes
Pros
- fast
- cheap
- 24/7
- no queues
★ Plug and play solution, market research (big players, startups, tailored)
CHATBOTS - tips and tricks
★ Microsoft Tay Project AI ChatBot via Twitter, closed after 16 hours
○ Insane Chatbot, Offensive tweets (racism, holocaust, etc.)
★ How to feed chatbot with data
○ Historical data (Logs from Data Warehouse)
○ Clean and extract conversation via emails, historical chat session, website forms
○ Analize the most common issues (e.g. 70% of all contacts can be in TOP 30)
○ Reorganization of workflow in FAQ, quick answers to the most common issues
○ Create user friendly error messages
○ Monitor, analyze, keep learning, testing, switch on/off ML (self-learning)
★ Implement bots from scratch (additional cost, better control)
○ possible, tutorials, online courses, samples codes available in Internet
Standard KPIs
★ Satisfaction
★ Response Time
★ Retention
★ Productivity
★ Turn Over
★ Cash Flows
are the current metrics enough?
Data Analysis
Advanced Analytics
Behavioral Analysis | Gamification | Marketing Campaigns
Segmentations | Cross and Upselling Strategies
Personalize Notifications | Predictive Modeling
Global Decision Tree with all possible policy
★ easily expandable
★ can differ according to geolocation (policy)
★ require constant reorganization
★ could be a possible bottleneck
○ remove “dead paths”
○ periodical simplification
Data Analysis - GDT
Data Science Workflow
★ Accuracy (%)
★ Retrain Models
4C in DS
Creativeness
Cleverness
Curiosity
Communicativeness
Machine Learning is 90% building pipelines and 10% running algorithms
Content based classification | Document organization | Knowledge discovery
Automatic Document Categorization
★ Emails, Chats, Logs
★ Documents, Scans (with OCR)
★ Websites
★ Spam filtering
★ Genre detection
★ Sentiment analysis
★ Language
identification
❏ Engines, APIs with pretrained models or build your own
❏ Open Source & SOTA (state of the art) approach
❏ Pipeline (Standard NLP)
❏ Deep Learning (RNN + BLSTM)
❏ etc.
Automatic Document Categorization
Categorize consumer financial complaints into 11 classes
Challenge description
Randomly ~9% Acc
Thousands of the consumer complaints weekly
about financial products and services
Need classification before sending final respond within 15 days
Data Train / Test 66.8k / 1k
Test 1k Data
Final Prediction 73.7% on 1000 samples
TensorFlow Model, train in 2 minutes
e.g. standard NLP approach
RAW DATA
Transformation
Data Filtering
Removing:
- tags
- duplicates
Formatting
CLEANING
Stemming
Case Folding
Removing:
- Stop words
- Double Letters
Add New Features
FEATURE
EXTRACTION
TF-IDF
Word2Vec
Data Enrichment
(non Textual)
Other embeddings
FastText
ELMo
TUNING
DL Topology
Hyperparameters
Grid Search
Ensemble methods
Confusion Matrix
ML MODELS
Kerras
BRNN & LSTM
Naive Bayes
Regression
Linear SVM
SVM (Kernel)
Random Forests
etc.
API
Final Model
Ready to
Deploy
Retrain and
Update Model
★ Category label
★ Probability of the correct categorization
Deep Learning, model in Keras
playground.tensorflow.org Run and Modify Neural Networks, no Coding Skills Required
training ~15 minutes
Github vs Standard vs Keras
73.7%
93.1%
89.6%RNN with BLSTM
★ Ensemble method
★ Imbalanced data (Imputation, e.g. SMOTE Algorithm)
★ Provide correct annotations
%
★ TPU/GPU (speed up Deep Learning Models trained locally)
ML new opportunities TPU/GPU/Clouds
★ Cloud Computing (VM’s with possible ML solutions)
○ Big players & new solutions
○ Cheap data storage per hour
○ Decreasing storage and computational costs in time
(Energy price is limit)
○ Leverage dataset with storage system
○ Additional cost with data migration, transition
○ Speed up data scientist work
○ Limitations (downtime, security, privacy, flexibility)
SUMMARY
partner-api.groupon.com
github.com/groupon
GROUPON
PARTNER
NETWORK
We are growing in Katowice
1 - 10 - 6k - 15 - 49M - 1.5B
Gdansk mlgdansk.pl - on going bi-weekly
Technical meetings about Data Science in Tricity
Machine Learning initiative soon in Silesia
16-18 XI Gliwice - skyhacks#1 AI Hackathon
Invitation
AI is the future, if I did not believe it, I would not do it
akarwan@groupon.com

Mais conteúdo relacionado

Semelhante a infoShare AI Roadshow 2018 - Adam Karwan (Groupon) - Jak wykorzystać uczenie maszynowe (Machine Learning) w celu usprawnienia procesów biznesowych organizacji

Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
vitm11
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)
Nikhil Garg
 

Semelhante a infoShare AI Roadshow 2018 - Adam Karwan (Groupon) - Jak wykorzystać uczenie maszynowe (Machine Learning) w celu usprawnienia procesów biznesowych organizacji (20)

Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
What is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PMWhat is Data as a Service by T-Mobile Principle Technical PM
What is Data as a Service by T-Mobile Principle Technical PM
 
Storage Challenges for Production Machine Learning
Storage Challenges for Production Machine LearningStorage Challenges for Production Machine Learning
Storage Challenges for Production Machine Learning
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Msst 2019 v4
Msst 2019 v4Msst 2019 v4
Msst 2019 v4
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
 
How to Use Deep Learning by Mu Sigma Product Manager
How to Use Deep Learning by Mu Sigma Product ManagerHow to Use Deep Learning by Mu Sigma Product Manager
How to Use Deep Learning by Mu Sigma Product Manager
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 
Data ops: Machine Learning in production
Data ops: Machine Learning in productionData ops: Machine Learning in production
Data ops: Machine Learning in production
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
 
Predicting Tweet Sentiment
Predicting Tweet SentimentPredicting Tweet Sentiment
Predicting Tweet Sentiment
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
 

Mais de Infoshare

infoShare AI Roadshow 2018 - Wojtek Ptak (Freshmail) - Teraz - najlepszy czas...
infoShare AI Roadshow 2018 - Wojtek Ptak (Freshmail) - Teraz - najlepszy czas...infoShare AI Roadshow 2018 - Wojtek Ptak (Freshmail) - Teraz - najlepszy czas...
infoShare AI Roadshow 2018 - Wojtek Ptak (Freshmail) - Teraz - najlepszy czas...
Infoshare
 
infoShare 2014: Peter Taylor,The Art of Productive Laziness
infoShare 2014: Peter Taylor,The Art of Productive LazinessinfoShare 2014: Peter Taylor,The Art of Productive Laziness
infoShare 2014: Peter Taylor,The Art of Productive Laziness
Infoshare
 
infoShare 2014: Paweł Brodziński, Efektywny czy zajęty? Jesteś pewien, że dob...
infoShare 2014: Paweł Brodziński, Efektywny czy zajęty? Jesteś pewien, że dob...infoShare 2014: Paweł Brodziński, Efektywny czy zajęty? Jesteś pewien, że dob...
infoShare 2014: Paweł Brodziński, Efektywny czy zajęty? Jesteś pewien, że dob...
Infoshare
 
infoShare 2014: Michał Sierzputowski, Testy automatyczne aplikacji webowych o...
infoShare 2014: Michał Sierzputowski, Testy automatyczne aplikacji webowych o...infoShare 2014: Michał Sierzputowski, Testy automatyczne aplikacji webowych o...
infoShare 2014: Michał Sierzputowski, Testy automatyczne aplikacji webowych o...
Infoshare
 
infoShare 2014: Michał Polak, Smart Cities na wyciągnięcie smartfona.
infoShare 2014: Michał Polak, Smart Cities na wyciągnięcie smartfona.infoShare 2014: Michał Polak, Smart Cities na wyciągnięcie smartfona.
infoShare 2014: Michał Polak, Smart Cities na wyciągnięcie smartfona.
Infoshare
 
infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarza...
infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarza...infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarza...
infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarza...
Infoshare
 
infoShare 2014: Marek Landowski, Architektura SWIFT obiektowego przechowywani...
infoShare 2014: Marek Landowski, Architektura SWIFT obiektowego przechowywani...infoShare 2014: Marek Landowski, Architektura SWIFT obiektowego przechowywani...
infoShare 2014: Marek Landowski, Architektura SWIFT obiektowego przechowywani...
Infoshare
 
infoShare 2014: Maciej Saganowski, Designing Mobile Services.
infoShare 2014: Maciej Saganowski, Designing Mobile Services.infoShare 2014: Maciej Saganowski, Designing Mobile Services.
infoShare 2014: Maciej Saganowski, Designing Mobile Services.
Infoshare
 
infoShare 2014:Jacek Samsel, Andrzej Pyra, Marketingowa mapa skarbów: w jaki ...
infoShare 2014:Jacek Samsel, Andrzej Pyra, Marketingowa mapa skarbów: w jaki ...infoShare 2014:Jacek Samsel, Andrzej Pyra, Marketingowa mapa skarbów: w jaki ...
infoShare 2014:Jacek Samsel, Andrzej Pyra, Marketingowa mapa skarbów: w jaki ...
Infoshare
 

Mais de Infoshare (20)

infoShare AI Roadshow 2018 - Barbara Leśniarek (Elitmind) - ”Łapać złodzieja!...
infoShare AI Roadshow 2018 - Barbara Leśniarek (Elitmind) - ”Łapać złodzieja!...infoShare AI Roadshow 2018 - Barbara Leśniarek (Elitmind) - ”Łapać złodzieja!...
infoShare AI Roadshow 2018 - Barbara Leśniarek (Elitmind) - ”Łapać złodzieja!...
 
infoShare AI Roadshow 2018 - Wojtek Ptak (Freshmail) - Teraz - najlepszy czas...
infoShare AI Roadshow 2018 - Wojtek Ptak (Freshmail) - Teraz - najlepszy czas...infoShare AI Roadshow 2018 - Wojtek Ptak (Freshmail) - Teraz - najlepszy czas...
infoShare AI Roadshow 2018 - Wojtek Ptak (Freshmail) - Teraz - najlepszy czas...
 
infoShare AI Roadshow 2018 - Tomasz Brzeziński (iTaxi) - Niestandardowe metod...
infoShare AI Roadshow 2018 - Tomasz Brzeziński (iTaxi) - Niestandardowe metod...infoShare AI Roadshow 2018 - Tomasz Brzeziński (iTaxi) - Niestandardowe metod...
infoShare AI Roadshow 2018 - Tomasz Brzeziński (iTaxi) - Niestandardowe metod...
 
infoShare AI Roadshow 2018 - Rafał Cycoń (ShelfWise.ai) - Wyzwania w tworzeni...
infoShare AI Roadshow 2018 - Rafał Cycoń (ShelfWise.ai) - Wyzwania w tworzeni...infoShare AI Roadshow 2018 - Rafał Cycoń (ShelfWise.ai) - Wyzwania w tworzeni...
infoShare AI Roadshow 2018 - Rafał Cycoń (ShelfWise.ai) - Wyzwania w tworzeni...
 
infoShare AI Roadshow 2018 - Paweł Wyborski (QuarticON) - Jak AI pomaga sprze...
infoShare AI Roadshow 2018 - Paweł Wyborski (QuarticON) - Jak AI pomaga sprze...infoShare AI Roadshow 2018 - Paweł Wyborski (QuarticON) - Jak AI pomaga sprze...
infoShare AI Roadshow 2018 - Paweł Wyborski (QuarticON) - Jak AI pomaga sprze...
 
infoShare AI Roadshow 2018 - Mateusz Biliński (Niebezpiecznik) - Hackowanie (...
infoShare AI Roadshow 2018 - Mateusz Biliński (Niebezpiecznik) - Hackowanie (...infoShare AI Roadshow 2018 - Mateusz Biliński (Niebezpiecznik) - Hackowanie (...
infoShare AI Roadshow 2018 - Mateusz Biliński (Niebezpiecznik) - Hackowanie (...
 
infoShare AI Roadshow 2018 - Krzysztof Kudryński & Błażej Kubiak (TomTom) - D...
infoShare AI Roadshow 2018 - Krzysztof Kudryński & Błażej Kubiak (TomTom) - D...infoShare AI Roadshow 2018 - Krzysztof Kudryński & Błażej Kubiak (TomTom) - D...
infoShare AI Roadshow 2018 - Krzysztof Kudryński & Błażej Kubiak (TomTom) - D...
 
infoShare AI Roadshow 2018 - Magdalena Wójcik (Data Love) - Data Science na d...
infoShare AI Roadshow 2018 - Magdalena Wójcik (Data Love) - Data Science na d...infoShare AI Roadshow 2018 - Magdalena Wójcik (Data Love) - Data Science na d...
infoShare AI Roadshow 2018 - Magdalena Wójcik (Data Love) - Data Science na d...
 
infoShare AI Roadshow 2018 - Adrian Boguszewski (Linux Polska) - Czy sieć neu...
infoShare AI Roadshow 2018 - Adrian Boguszewski (Linux Polska) - Czy sieć neu...infoShare AI Roadshow 2018 - Adrian Boguszewski (Linux Polska) - Czy sieć neu...
infoShare AI Roadshow 2018 - Adrian Boguszewski (Linux Polska) - Czy sieć neu...
 
infoShare AI Roadshow 2018 - Dorian Nikoniuk (Microsoft) - Usługi poznawcze, ...
infoShare AI Roadshow 2018 - Dorian Nikoniuk (Microsoft) - Usługi poznawcze, ...infoShare AI Roadshow 2018 - Dorian Nikoniuk (Microsoft) - Usługi poznawcze, ...
infoShare AI Roadshow 2018 - Dorian Nikoniuk (Microsoft) - Usługi poznawcze, ...
 
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
 
infoShare AI Roadshow 2018 - Michał Ćwiok (Clouds on Mars) - Usługi AI w chmurze
infoShare AI Roadshow 2018 - Michał Ćwiok (Clouds on Mars) - Usługi AI w chmurzeinfoShare AI Roadshow 2018 - Michał Ćwiok (Clouds on Mars) - Usługi AI w chmurze
infoShare AI Roadshow 2018 - Michał Ćwiok (Clouds on Mars) - Usługi AI w chmurze
 
infoShare 2014: Peter Taylor,The Art of Productive Laziness
infoShare 2014: Peter Taylor,The Art of Productive LazinessinfoShare 2014: Peter Taylor,The Art of Productive Laziness
infoShare 2014: Peter Taylor,The Art of Productive Laziness
 
infoShare 2014: Paweł Brodziński, Efektywny czy zajęty? Jesteś pewien, że dob...
infoShare 2014: Paweł Brodziński, Efektywny czy zajęty? Jesteś pewien, że dob...infoShare 2014: Paweł Brodziński, Efektywny czy zajęty? Jesteś pewien, że dob...
infoShare 2014: Paweł Brodziński, Efektywny czy zajęty? Jesteś pewien, że dob...
 
infoShare 2014: Michał Sierzputowski, Testy automatyczne aplikacji webowych o...
infoShare 2014: Michał Sierzputowski, Testy automatyczne aplikacji webowych o...infoShare 2014: Michał Sierzputowski, Testy automatyczne aplikacji webowych o...
infoShare 2014: Michał Sierzputowski, Testy automatyczne aplikacji webowych o...
 
infoShare 2014: Michał Polak, Smart Cities na wyciągnięcie smartfona.
infoShare 2014: Michał Polak, Smart Cities na wyciągnięcie smartfona.infoShare 2014: Michał Polak, Smart Cities na wyciągnięcie smartfona.
infoShare 2014: Michał Polak, Smart Cities na wyciągnięcie smartfona.
 
infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarza...
infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarza...infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarza...
infoShare 2014: Mariusz Róg, Big Data w praktyce -- jak efektywnie przetwarza...
 
infoShare 2014: Marek Landowski, Architektura SWIFT obiektowego przechowywani...
infoShare 2014: Marek Landowski, Architektura SWIFT obiektowego przechowywani...infoShare 2014: Marek Landowski, Architektura SWIFT obiektowego przechowywani...
infoShare 2014: Marek Landowski, Architektura SWIFT obiektowego przechowywani...
 
infoShare 2014: Maciej Saganowski, Designing Mobile Services.
infoShare 2014: Maciej Saganowski, Designing Mobile Services.infoShare 2014: Maciej Saganowski, Designing Mobile Services.
infoShare 2014: Maciej Saganowski, Designing Mobile Services.
 
infoShare 2014:Jacek Samsel, Andrzej Pyra, Marketingowa mapa skarbów: w jaki ...
infoShare 2014:Jacek Samsel, Andrzej Pyra, Marketingowa mapa skarbów: w jaki ...infoShare 2014:Jacek Samsel, Andrzej Pyra, Marketingowa mapa skarbów: w jaki ...
infoShare 2014:Jacek Samsel, Andrzej Pyra, Marketingowa mapa skarbów: w jaki ...
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

infoShare AI Roadshow 2018 - Adam Karwan (Groupon) - Jak wykorzystać uczenie maszynowe (Machine Learning) w celu usprawnienia procesów biznesowych organizacji

  • 1.
  • 2. 1. Why Data Science? 2. Personal Experience a. Evolution b. Machine Learning Challenges 3. Business Process Automation a. RPA b. Chatbots c. Data Analytics d. Data Science 4. Demo a. Multiple Text Classification b. Deep Learning vs Standard NLP approach 5. Summary AGENDA
  • 4. Data Science will be future and I can work on the interesting projects for the next years
  • 6. ★ Data Sets ★ Contest ★ Codes Kaggle - Data Science as a Competition ★ Self-learning ★ Recruitment ★ Algorithms for Companies
  • 8. ★ Automation of the business processes within organization ○ minimize costs, time and human errors ○ existing toolkits or open source to implement scripts RPA (Robotics Process Automation) Steps to implement RPA I. Find and describe business process II. Estimate complexity and cost saving III. Implementation, deployment and monitoring robonomika.pl www.economist.com/business/2018/09/15/ai-may-not-be-bad-news-for-workers
  • 9. RPA - use cases, a little brainstorm blog.appliedai.com/robotic-process-automation-use-cases Possible areas ➢ Customer Service ➢ Data Management ➢ Financial Services ➢ Human Resources ➢ Operations ➢ Procurement ➢ Reporting ➢ Sales
  • 10. Brief History 1966 NLP Computer program ELIZA, @MIT AI Lab, Weizenbaum CHATBOTS Automatic conversation with Customers (Virtual Agent) 2018 Real Conversation via Phone Call and IOT Devices ★ TTS (Text to Speech) ★ SR (Speech Recognition Engine) AI Assistant calls local business youtube.com/watch?v=D5VN56jQMWM
  • 11. Command based and structured structured questions and answers AI based based on Machine Learning CHATBOTS Cons - not every business - limitted audience - retention - mistakes Pros - fast - cheap - 24/7 - no queues
  • 12. ★ Plug and play solution, market research (big players, startups, tailored) CHATBOTS - tips and tricks ★ Microsoft Tay Project AI ChatBot via Twitter, closed after 16 hours ○ Insane Chatbot, Offensive tweets (racism, holocaust, etc.) ★ How to feed chatbot with data ○ Historical data (Logs from Data Warehouse) ○ Clean and extract conversation via emails, historical chat session, website forms ○ Analize the most common issues (e.g. 70% of all contacts can be in TOP 30) ○ Reorganization of workflow in FAQ, quick answers to the most common issues ○ Create user friendly error messages ○ Monitor, analyze, keep learning, testing, switch on/off ML (self-learning) ★ Implement bots from scratch (additional cost, better control) ○ possible, tutorials, online courses, samples codes available in Internet
  • 13. Standard KPIs ★ Satisfaction ★ Response Time ★ Retention ★ Productivity ★ Turn Over ★ Cash Flows are the current metrics enough? Data Analysis Advanced Analytics Behavioral Analysis | Gamification | Marketing Campaigns Segmentations | Cross and Upselling Strategies Personalize Notifications | Predictive Modeling
  • 14. Global Decision Tree with all possible policy ★ easily expandable ★ can differ according to geolocation (policy) ★ require constant reorganization ★ could be a possible bottleneck ○ remove “dead paths” ○ periodical simplification Data Analysis - GDT
  • 15. Data Science Workflow ★ Accuracy (%) ★ Retrain Models 4C in DS Creativeness Cleverness Curiosity Communicativeness Machine Learning is 90% building pipelines and 10% running algorithms
  • 16. Content based classification | Document organization | Knowledge discovery Automatic Document Categorization ★ Emails, Chats, Logs ★ Documents, Scans (with OCR) ★ Websites ★ Spam filtering ★ Genre detection ★ Sentiment analysis ★ Language identification
  • 17. ❏ Engines, APIs with pretrained models or build your own ❏ Open Source & SOTA (state of the art) approach ❏ Pipeline (Standard NLP) ❏ Deep Learning (RNN + BLSTM) ❏ etc. Automatic Document Categorization
  • 18. Categorize consumer financial complaints into 11 classes Challenge description Randomly ~9% Acc Thousands of the consumer complaints weekly about financial products and services Need classification before sending final respond within 15 days
  • 19. Data Train / Test 66.8k / 1k
  • 21. Final Prediction 73.7% on 1000 samples TensorFlow Model, train in 2 minutes
  • 22. e.g. standard NLP approach RAW DATA Transformation Data Filtering Removing: - tags - duplicates Formatting CLEANING Stemming Case Folding Removing: - Stop words - Double Letters Add New Features FEATURE EXTRACTION TF-IDF Word2Vec Data Enrichment (non Textual) Other embeddings FastText ELMo TUNING DL Topology Hyperparameters Grid Search Ensemble methods Confusion Matrix ML MODELS Kerras BRNN & LSTM Naive Bayes Regression Linear SVM SVM (Kernel) Random Forests etc. API Final Model Ready to Deploy Retrain and Update Model ★ Category label ★ Probability of the correct categorization
  • 23. Deep Learning, model in Keras playground.tensorflow.org Run and Modify Neural Networks, no Coding Skills Required training ~15 minutes
  • 24. Github vs Standard vs Keras 73.7% 93.1% 89.6%RNN with BLSTM ★ Ensemble method ★ Imbalanced data (Imputation, e.g. SMOTE Algorithm) ★ Provide correct annotations %
  • 25. ★ TPU/GPU (speed up Deep Learning Models trained locally) ML new opportunities TPU/GPU/Clouds ★ Cloud Computing (VM’s with possible ML solutions) ○ Big players & new solutions ○ Cheap data storage per hour ○ Decreasing storage and computational costs in time (Energy price is limit) ○ Leverage dataset with storage system ○ Additional cost with data migration, transition ○ Speed up data scientist work ○ Limitations (downtime, security, privacy, flexibility)
  • 28. We are growing in Katowice 1 - 10 - 6k - 15 - 49M - 1.5B
  • 29. Gdansk mlgdansk.pl - on going bi-weekly Technical meetings about Data Science in Tricity Machine Learning initiative soon in Silesia 16-18 XI Gliwice - skyhacks#1 AI Hackathon Invitation AI is the future, if I did not believe it, I would not do it akarwan@groupon.com