SlideShare uma empresa Scribd logo
1 de 27
Baixar para ler offline
AnswerBot
Introduction
• Avkash Chauhan, H2O.ai
o Head of enterprise products and customers
o @avkashchauhan | https://www.linkedin.com/in/avkashchauhan
• Products
o H2O
o Sparkling Water
o Deep Water
• NN – Tensorflow, mxnet, Caffe
• GPU
• xgboost - Distributed
What is an AnswerBot?
• An AnswerBot is an standalone intelligent application
• AnswerBot uses machine learning to respond user input
• Provide relevant knowledge base articles as answers
• Self-service customer base
• Raises awareness of knowledge base offerings
• Generate product feedback silently
AnswerBot – Client Interface
AnswerBot – Result Interface
Possible	Answers:
Possible	Answers:
60%
42%
60%
42%
More..
More..
More..
More..
AnswerBot – Administrator Interface
Male Female
Positive Negative
Question
Tags
Sentiment
Priority Low CriticalMedium High
Sex
Ratings
Top	(n)	Answers 728 35% 728 27% 718 17% 800 13% 128 3%
128 20% 18 20% 621 20% 801 20% 1208 20%
NSFW 3
Community
Stackoverflow
Reddit
Quora
Slack
Bot
AWS	
API	Gateway
AWS	Lambda
(Question	Scoring)
S3
DynamoDB
AWS	SQS
A	ML	pipeline	prototype	to	get	top	N	matching	answers
AWS	SNS
AnswerBot in production - Teaser
Scoring	Pipeline
Model	
Preparation	
Process
Model	Production
Support	Portal
Problems to solve
• Finding proper tags
• Finding & Removing NSFW words
• Sentiment in the question (positive or negative)
• Priority to find the answer (Low, medium, high, critical)
• Can we figure out if questioner is male or female?
• Question rating (How the question was written?)
• Findings best available answers
• Duplicate Questions
Problems to solve – Solutions (Part 1)
1. Finding proper tags:
1. Word Embedding's
2. Matching words
2. Finding & Removing NSFW words
1. Brute Force Search
2. NLTK Stop Words
3. Sentiment in the question: (Positive or Negative)
1. Binomial (2 classes)classification
1. Tree Based Algorithms (GBM/RF/DRF) or NN
Problems to solve – Solutions (Part 2)
1. Priority to find the answer (Low/Medium/High/Critical)
1. Multinomial (4 classes) classification
1. Tree based algorithms (GBM/RF/DRF) or NN
2. Can we figure our if questioner is male or female?
1. Binomial (2 classes Classification)
1. Tree based algorithms (GBM/RF/DRF) or NN
3. Question rating (How the question was written?)
1. Multinomial (N class – 1-5 star) classification
1. Tree Based algorithms or NN
Problems to solve – Solutions (Part 3)
1. Findings best available answers
1. Looking for the tags and keywords – Clustering / Reduction
2. Creating tag & keywords weights for each question
3. Matching tag, keywords and their weights to find top
probabilities
2. Duplicate Questions
1. Quora has same problem to solve on Kaggle
1. https://www.kaggle.com/c/quora-question-pairs/data
2. https://www.kaggle.com/anokas/data-analysis-xgboost-
starter-0-35460-lb
Data Preparation
• Real Data
o Real Question/Answers
• StackOverflow, Community, Quora, Support System
• Experimental Data
o Yelp – 41M reviews in 1-5 stars category - Supervised
• Ratings: 1-5
o Twitter Sentiment – Search it OR Mine It - Supervised
• Positive/Negative
• Male/Female
Our Experimentation Today
• Classifying sentences to predict
o Ratings: Starts (1-5)
• Multinomial classification example
o Sentiments: Positive or Negative
• Binomial classification example
Demo
• Binomial & Multinomial Classification
$ python PredictNow.py
Why Keras?
• High level API (Python) to run top of Tensorflow & Theano
• Great for quick and fast experimentation
• Supports both CNN and RNN and combination of two
• Run on CPU & GPU
• Visit: https://blog.keras.io/keras-as-a-simplified-interface-to-
tensorflow-tutorial.html
Word2vec
• Word2vec is an Neural Network based word embedding
method.
• A Neural Network with only 1 linear hidden layer
o Hidden layer's is used to transform inputs into something
that the output layer can use.
o Each hidden unit has the linear activation
• Represent words in a continuous, low dimensional vector
space ((i.e., the embedding space)
o Semantically similar words are mapped to nearby points.
Understanding Dataset
• Ratings Analysis
o review,stars
o The food is WAAAAY overpriced and totally not worth it, they charged for the salsa and the service was ridiculously slow....The
guacamole was good though., 2
o Decent food at a great price. Unfortunately, the place is so jam packed it's almost an inconvenience to head back to the buffet
lines., 2
o Love getting my haircut here! It's only $25 for a women's haircut. I'm pretty picky about how much my hair is layered and I've
never had a problem here. Make sure to call in to schedule your appointment ahead of time during the school year because she's
usually booked two days in advance., 5
• Sentiment Analysis
o Text, Sentiment
o I lost $80 today I know I shouldn't put things in my back pocket but I was about to put in my bag when I realized it was gone., 0
o Just got back from Seattle. Lots of crowds. Nordstrom was nuts. But Taphouse Grill was practically empty. Found hardcover of
Mad Love!, 1
o Crunch week! This Friday, I'll be heading to Oddmall, my first major craft fair, in Hudson, Ohio! I'm tricking out the website., 1
o Another beautiful day out today!! Going to build some models first then go for running!! 1
o Tired. Just tired. Home time!! I'm weaksauce, I know , 0
Components & Experimentation
• Keras
• Tensorflow
o GPU
• NLTK
o Using Stop Words
• Glove
o Pre-trained word2vec datasets
o Small (400K words)
• Python
• Jupyter notebook
Experimentation – Part 1
1. Data Preparation
2. Creating word collection
1. Removing stop words
2. Collecting all words into a big list
3. Tokenization and uniform data collection
1. Using full words collection
2. Get unique words in our collection
3. Tokenize are sentence level
4. Final Dataset
1. Sentences [sentences_per_record, length] - X
2. Labels [label_per_recordm, length] – Y
Experimentation– Part 2
4. Splitting dataset to training and validation
5. Creating Embedding Matrix
o Loading predefined word vector
o Finding match words from our collection and creating
embedding word matrix
6. Creating Embedding Layer/Configuration
7. Training
Experimentation– Part 3
8. Understanding results
o Layers connection
o Model configuration
o Model weights
9. Saving model configuration, weights, data-model
o HDF5 is a data model, library, and file format for
storing and managing data
Experimentation– Part 4
10. Model Metrics and Performance
o Getting Model Metrics
o Model Performance Graph
o Model Accuracy
• Training
• Validation
11.Prediction
o Validation Data
o User Input
What if you hit exact same prediction
• Bad Model - Could be a bad model. Retrain it.
• Rebalance your dataset:
o Either upsample less frequent class
o Or downsample more frequent one.
• Adjust class weights: Setting higher class weight for
less frequent class, network will put more attention on the
downsampled class during training
• Increase the time of training: After long training time
network starts concentrating more on less frequent classes.
Advance Processing
• Engine:
o Doc2seq -https://radimrehurek.com/gensim/models/doc2vec.html
o Seq2seq - https://github.com/farizrahman4u/seq2seq
o Lda2vec - http://multithreaded.stitchfix.com/blog/2016/05/27/lda2vec/
o RNN & LSTM - https://arxiv.org/pdf/1502.06922.pdf
• Training
o CPU vs GPU
o Checkpoints with training
AnswerBot production pipeline in cloud (AWS)
Community
Stackoverflow
Reddit
Quora
Slack
Bot
AWS	
API	Gateway
AWS	Lambda
(Question	Scoring)
S3
DynamoDB
AWS	SQS
A	ML	pipeline	prototype	to	get	top	N	matching	answers
AWS	SNS
Scoring	Pipeline
Model	
Preparation	
Process
Model	Production
Support	Portal
Content
• Github - https://github.com/Avkash/mldl/tree/master/tensorbeat-answerbot
• Dataset
o Sentiment : Search it or Mine it
o 5Star - https://www.yelp.com/dataset_challenge/dataset
• Python/Jupyter Notebook
o Sentiment:
• make-sentiment-model.py
• PositiveNegative.ipynb
o 5Star – make-5star-model.py
• make-5star-model.py
• 5StarReviews.ipynb
o Prediction – PredictNow.py
Thank you so much

Mais conteúdo relacionado

Mais procurados

KERAS Python Tutorial
KERAS Python TutorialKERAS Python Tutorial
KERAS Python TutorialMahmutKAMALAK
 
Improving Neural Question Generation using Answer Separation
Improving Neural Question Generation using Answer SeparationImproving Neural Question Generation using Answer Separation
Improving Neural Question Generation using Answer SeparationYang Hoon Kim
 
Xuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsXuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsMachine Learning Prague
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jWilliam Lyon
 
Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Developing A Big Data Search Engine - Where we have gone. Where we are going:...Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Developing A Big Data Search Engine - Where we have gone. Where we are going:...Lucidworks
 
Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Kai Li
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extractionGabriel Hamilton
 
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Sujit Pal
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning SystemsAnuj Gupta
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksBICA Labs
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engineLars Marius Garshol
 
Introduction to Keras
Introduction to KerasIntroduction to Keras
Introduction to KerasJohn Ramey
 
Deep Dialog System Review
Deep Dialog System ReviewDeep Dialog System Review
Deep Dialog System ReviewNguyen Quang
 

Mais procurados (20)

KERAS Python Tutorial
KERAS Python TutorialKERAS Python Tutorial
KERAS Python Tutorial
 
Improving Neural Question Generation using Answer Separation
Improving Neural Question Generation using Answer SeparationImproving Neural Question Generation using Answer Separation
Improving Neural Question Generation using Answer Separation
 
Xuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsXuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent Applications
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
 
Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Developing A Big Data Search Engine - Where we have gone. Where we are going:...Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Developing A Big Data Search Engine - Where we have gone. Where we are going:...
 
Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...
 
Practical Deep Learning
Practical Deep LearningPractical Deep Learning
Practical Deep Learning
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
 
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural Networks
 
Alleviating Data Sparsity for Twitter Sentiment Analysis
Alleviating Data Sparsity for Twitter Sentiment AnalysisAlleviating Data Sparsity for Twitter Sentiment Analysis
Alleviating Data Sparsity for Twitter Sentiment Analysis
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engine
 
Introduction to Keras
Introduction to KerasIntroduction to Keras
Introduction to Keras
 
Deep Domain
Deep DomainDeep Domain
Deep Domain
 
Deep Dialog System Review
Deep Dialog System ReviewDeep Dialog System Review
Deep Dialog System Review
 
Meta learning tutorial
Meta learning tutorialMeta learning tutorial
Meta learning tutorial
 
Chatbot ppt
Chatbot pptChatbot ppt
Chatbot ppt
 
kdd2015
kdd2015kdd2015
kdd2015
 

Semelhante a Creating AnswerBot with Keras and TensorFlow (TensorBeat)

1 ASSIGNMENT 1 REVIEWING RESEARCH AND MAKIN.docx
1  ASSIGNMENT 1   REVIEWING RESEARCH AND MAKIN.docx1  ASSIGNMENT 1   REVIEWING RESEARCH AND MAKIN.docx
1 ASSIGNMENT 1 REVIEWING RESEARCH AND MAKIN.docxoswald1horne84988
 
Scalable Data Models with Elasticsearch
Scalable Data Models with ElasticsearchScalable Data Models with Elasticsearch
Scalable Data Models with ElasticsearchBeyondTrees
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsDarius Barušauskas
 
Extreme Extractive Text Summarization and Topic Modeling (using LSA and LDA t...
Extreme Extractive Text Summarization and Topic Modeling (using LSA and LDA t...Extreme Extractive Text Summarization and Topic Modeling (using LSA and LDA t...
Extreme Extractive Text Summarization and Topic Modeling (using LSA and LDA t...Giorgio Carbone
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersIvo Andreev
 
Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014StampedeCon
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - BengaluruKunal Jain
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
 
Barga Data Science lecture 4
Barga Data Science lecture 4Barga Data Science lecture 4
Barga Data Science lecture 4Roger Barga
 
10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJulio10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJuliosarahdijulio
 
Introduction to Recommendation System
Introduction to Recommendation SystemIntroduction to Recommendation System
Introduction to Recommendation SystemMinha Hwang
 
Practical Machine Learning and Rails Part2
Practical Machine Learning and Rails Part2Practical Machine Learning and Rails Part2
Practical Machine Learning and Rails Part2ryanstout
 
m-Assessment_Brum_DaveNDanny
m-Assessment_Brum_DaveNDannym-Assessment_Brum_DaveNDanny
m-Assessment_Brum_DaveNDannyDavid Sugden
 
Content strategy for public lands websites part 2 2
Content strategy for public lands websites part 2 2Content strategy for public lands websites part 2 2
Content strategy for public lands websites part 2 2ss4appl
 

Semelhante a Creating AnswerBot with Keras and TensorFlow (TensorBeat) (20)

1 ASSIGNMENT 1 REVIEWING RESEARCH AND MAKIN.docx
1  ASSIGNMENT 1   REVIEWING RESEARCH AND MAKIN.docx1  ASSIGNMENT 1   REVIEWING RESEARCH AND MAKIN.docx
1 ASSIGNMENT 1 REVIEWING RESEARCH AND MAKIN.docx
 
Scalable Data Models with Elasticsearch
Scalable Data Models with ElasticsearchScalable Data Models with Elasticsearch
Scalable Data Models with Elasticsearch
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitions
 
Extreme Extractive Text Summarization and Topic Modeling (using LSA and LDA t...
Extreme Extractive Text Summarization and Topic Modeling (using LSA and LDA t...Extreme Extractive Text Summarization and Topic Modeling (using LSA and LDA t...
Extreme Extractive Text Summarization and Topic Modeling (using LSA and LDA t...
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
 
Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014
 
Lecture - Data Mining
Lecture - Data MiningLecture - Data Mining
Lecture - Data Mining
 
Fuzzy clustering of sentence
Fuzzy clustering of sentenceFuzzy clustering of sentence
Fuzzy clustering of sentence
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - Bengaluru
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
Recommender Systems and Linked Open Data
Recommender Systems and Linked Open DataRecommender Systems and Linked Open Data
Recommender Systems and Linked Open Data
 
Taming Text
Taming TextTaming Text
Taming Text
 
Data Mining Lecture_2.pptx
Data Mining Lecture_2.pptxData Mining Lecture_2.pptx
Data Mining Lecture_2.pptx
 
Barga Data Science lecture 4
Barga Data Science lecture 4Barga Data Science lecture 4
Barga Data Science lecture 4
 
10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJulio10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJulio
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Introduction to Recommendation System
Introduction to Recommendation SystemIntroduction to Recommendation System
Introduction to Recommendation System
 
Practical Machine Learning and Rails Part2
Practical Machine Learning and Rails Part2Practical Machine Learning and Rails Part2
Practical Machine Learning and Rails Part2
 
m-Assessment_Brum_DaveNDanny
m-Assessment_Brum_DaveNDannym-Assessment_Brum_DaveNDanny
m-Assessment_Brum_DaveNDanny
 
Content strategy for public lands websites part 2 2
Content strategy for public lands websites part 2 2Content strategy for public lands websites part 2 2
Content strategy for public lands websites part 2 2
 

Mais de Avkash Chauhan

AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo JapanAI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo JapanAvkash Chauhan
 
AI Expo - AI Revolution in Silicon Valley
AI Expo - AI Revolution in Silicon ValleyAI Expo - AI Revolution in Silicon Valley
AI Expo - AI Revolution in Silicon ValleyAvkash Chauhan
 
Nikkei xTech coverage on macnica.ai announcement
Nikkei xTech coverage on macnica.ai announcementNikkei xTech coverage on macnica.ai announcement
Nikkei xTech coverage on macnica.ai announcementAvkash Chauhan
 
Big Data Perspective UI V2
Big Data Perspective UI V2Big Data Perspective UI V2
Big Data Perspective UI V2Avkash Chauhan
 
Big Data Perspective (UI)
Big Data Perspective (UI)Big Data Perspective (UI)
Big Data Perspective (UI)Avkash Chauhan
 
Big Data Perspective (Company Information)
Big Data Perspective (Company Information)Big Data Perspective (Company Information)
Big Data Perspective (Company Information)Avkash Chauhan
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopAvkash Chauhan
 
Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics
Data 360 Conference: Introduction to Big Data, Hadoop and Big Data AnalyticsData 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics
Data 360 Conference: Introduction to Big Data, Hadoop and Big Data AnalyticsAvkash Chauhan
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopAvkash Chauhan
 
The concept of Datalake with Hadoop
The concept of Datalake with HadoopThe concept of Datalake with Hadoop
The concept of Datalake with HadoopAvkash Chauhan
 
Developing Hadoop strategy for your Enterprise
Developing Hadoop strategy for your EnterpriseDeveloping Hadoop strategy for your Enterprise
Developing Hadoop strategy for your EnterpriseAvkash Chauhan
 
Introduction to Hadoop at Data-360 Conference
Introduction to Hadoop at Data-360 ConferenceIntroduction to Hadoop at Data-360 Conference
Introduction to Hadoop at Data-360 ConferenceAvkash Chauhan
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache HiveAvkash Chauhan
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache SqoopAvkash Chauhan
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache PigAvkash Chauhan
 

Mais de Avkash Chauhan (17)

AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo JapanAI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
 
AI Expo - AI Revolution in Silicon Valley
AI Expo - AI Revolution in Silicon ValleyAI Expo - AI Revolution in Silicon Valley
AI Expo - AI Revolution in Silicon Valley
 
Nikkei xTech coverage on macnica.ai announcement
Nikkei xTech coverage on macnica.ai announcementNikkei xTech coverage on macnica.ai announcement
Nikkei xTech coverage on macnica.ai announcement
 
H2O Core Introduction
H2O Core IntroductionH2O Core Introduction
H2O Core Introduction
 
Big Data Perspective UI V2
Big Data Perspective UI V2Big Data Perspective UI V2
Big Data Perspective UI V2
 
Big Data Perspective (UI)
Big Data Perspective (UI)Big Data Perspective (UI)
Big Data Perspective (UI)
 
Big Data Perspective (Company Information)
Big Data Perspective (Company Information)Big Data Perspective (Company Information)
Big Data Perspective (Company Information)
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
 
Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics
Data 360 Conference: Introduction to Big Data, Hadoop and Big Data AnalyticsData 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics
Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
 
The concept of Datalake with Hadoop
The concept of Datalake with HadoopThe concept of Datalake with Hadoop
The concept of Datalake with Hadoop
 
Developing Hadoop strategy for your Enterprise
Developing Hadoop strategy for your EnterpriseDeveloping Hadoop strategy for your Enterprise
Developing Hadoop strategy for your Enterprise
 
Introduction to Hadoop at Data-360 Conference
Introduction to Hadoop at Data-360 ConferenceIntroduction to Hadoop at Data-360 Conference
Introduction to Hadoop at Data-360 Conference
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache Sqoop
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache Pig
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 

Último

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Último (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Creating AnswerBot with Keras and TensorFlow (TensorBeat)

  • 2. Introduction • Avkash Chauhan, H2O.ai o Head of enterprise products and customers o @avkashchauhan | https://www.linkedin.com/in/avkashchauhan • Products o H2O o Sparkling Water o Deep Water • NN – Tensorflow, mxnet, Caffe • GPU • xgboost - Distributed
  • 3. What is an AnswerBot? • An AnswerBot is an standalone intelligent application • AnswerBot uses machine learning to respond user input • Provide relevant knowledge base articles as answers • Self-service customer base • Raises awareness of knowledge base offerings • Generate product feedback silently
  • 5. AnswerBot – Result Interface Possible Answers: Possible Answers: 60% 42% 60% 42% More.. More.. More.. More..
  • 6. AnswerBot – Administrator Interface Male Female Positive Negative Question Tags Sentiment Priority Low CriticalMedium High Sex Ratings Top (n) Answers 728 35% 728 27% 718 17% 800 13% 128 3% 128 20% 18 20% 621 20% 801 20% 1208 20% NSFW 3
  • 8. Problems to solve • Finding proper tags • Finding & Removing NSFW words • Sentiment in the question (positive or negative) • Priority to find the answer (Low, medium, high, critical) • Can we figure out if questioner is male or female? • Question rating (How the question was written?) • Findings best available answers • Duplicate Questions
  • 9. Problems to solve – Solutions (Part 1) 1. Finding proper tags: 1. Word Embedding's 2. Matching words 2. Finding & Removing NSFW words 1. Brute Force Search 2. NLTK Stop Words 3. Sentiment in the question: (Positive or Negative) 1. Binomial (2 classes)classification 1. Tree Based Algorithms (GBM/RF/DRF) or NN
  • 10. Problems to solve – Solutions (Part 2) 1. Priority to find the answer (Low/Medium/High/Critical) 1. Multinomial (4 classes) classification 1. Tree based algorithms (GBM/RF/DRF) or NN 2. Can we figure our if questioner is male or female? 1. Binomial (2 classes Classification) 1. Tree based algorithms (GBM/RF/DRF) or NN 3. Question rating (How the question was written?) 1. Multinomial (N class – 1-5 star) classification 1. Tree Based algorithms or NN
  • 11. Problems to solve – Solutions (Part 3) 1. Findings best available answers 1. Looking for the tags and keywords – Clustering / Reduction 2. Creating tag & keywords weights for each question 3. Matching tag, keywords and their weights to find top probabilities 2. Duplicate Questions 1. Quora has same problem to solve on Kaggle 1. https://www.kaggle.com/c/quora-question-pairs/data 2. https://www.kaggle.com/anokas/data-analysis-xgboost- starter-0-35460-lb
  • 12. Data Preparation • Real Data o Real Question/Answers • StackOverflow, Community, Quora, Support System • Experimental Data o Yelp – 41M reviews in 1-5 stars category - Supervised • Ratings: 1-5 o Twitter Sentiment – Search it OR Mine It - Supervised • Positive/Negative • Male/Female
  • 13. Our Experimentation Today • Classifying sentences to predict o Ratings: Starts (1-5) • Multinomial classification example o Sentiments: Positive or Negative • Binomial classification example
  • 14. Demo • Binomial & Multinomial Classification $ python PredictNow.py
  • 15. Why Keras? • High level API (Python) to run top of Tensorflow & Theano • Great for quick and fast experimentation • Supports both CNN and RNN and combination of two • Run on CPU & GPU • Visit: https://blog.keras.io/keras-as-a-simplified-interface-to- tensorflow-tutorial.html
  • 16. Word2vec • Word2vec is an Neural Network based word embedding method. • A Neural Network with only 1 linear hidden layer o Hidden layer's is used to transform inputs into something that the output layer can use. o Each hidden unit has the linear activation • Represent words in a continuous, low dimensional vector space ((i.e., the embedding space) o Semantically similar words are mapped to nearby points.
  • 17. Understanding Dataset • Ratings Analysis o review,stars o The food is WAAAAY overpriced and totally not worth it, they charged for the salsa and the service was ridiculously slow....The guacamole was good though., 2 o Decent food at a great price. Unfortunately, the place is so jam packed it's almost an inconvenience to head back to the buffet lines., 2 o Love getting my haircut here! It's only $25 for a women's haircut. I'm pretty picky about how much my hair is layered and I've never had a problem here. Make sure to call in to schedule your appointment ahead of time during the school year because she's usually booked two days in advance., 5 • Sentiment Analysis o Text, Sentiment o I lost $80 today I know I shouldn't put things in my back pocket but I was about to put in my bag when I realized it was gone., 0 o Just got back from Seattle. Lots of crowds. Nordstrom was nuts. But Taphouse Grill was practically empty. Found hardcover of Mad Love!, 1 o Crunch week! This Friday, I'll be heading to Oddmall, my first major craft fair, in Hudson, Ohio! I'm tricking out the website., 1 o Another beautiful day out today!! Going to build some models first then go for running!! 1 o Tired. Just tired. Home time!! I'm weaksauce, I know , 0
  • 18. Components & Experimentation • Keras • Tensorflow o GPU • NLTK o Using Stop Words • Glove o Pre-trained word2vec datasets o Small (400K words) • Python • Jupyter notebook
  • 19. Experimentation – Part 1 1. Data Preparation 2. Creating word collection 1. Removing stop words 2. Collecting all words into a big list 3. Tokenization and uniform data collection 1. Using full words collection 2. Get unique words in our collection 3. Tokenize are sentence level 4. Final Dataset 1. Sentences [sentences_per_record, length] - X 2. Labels [label_per_recordm, length] – Y
  • 20. Experimentation– Part 2 4. Splitting dataset to training and validation 5. Creating Embedding Matrix o Loading predefined word vector o Finding match words from our collection and creating embedding word matrix 6. Creating Embedding Layer/Configuration 7. Training
  • 21. Experimentation– Part 3 8. Understanding results o Layers connection o Model configuration o Model weights 9. Saving model configuration, weights, data-model o HDF5 is a data model, library, and file format for storing and managing data
  • 22. Experimentation– Part 4 10. Model Metrics and Performance o Getting Model Metrics o Model Performance Graph o Model Accuracy • Training • Validation 11.Prediction o Validation Data o User Input
  • 23. What if you hit exact same prediction • Bad Model - Could be a bad model. Retrain it. • Rebalance your dataset: o Either upsample less frequent class o Or downsample more frequent one. • Adjust class weights: Setting higher class weight for less frequent class, network will put more attention on the downsampled class during training • Increase the time of training: After long training time network starts concentrating more on less frequent classes.
  • 24. Advance Processing • Engine: o Doc2seq -https://radimrehurek.com/gensim/models/doc2vec.html o Seq2seq - https://github.com/farizrahman4u/seq2seq o Lda2vec - http://multithreaded.stitchfix.com/blog/2016/05/27/lda2vec/ o RNN & LSTM - https://arxiv.org/pdf/1502.06922.pdf • Training o CPU vs GPU o Checkpoints with training
  • 25. AnswerBot production pipeline in cloud (AWS) Community Stackoverflow Reddit Quora Slack Bot AWS API Gateway AWS Lambda (Question Scoring) S3 DynamoDB AWS SQS A ML pipeline prototype to get top N matching answers AWS SNS Scoring Pipeline Model Preparation Process Model Production Support Portal
  • 26. Content • Github - https://github.com/Avkash/mldl/tree/master/tensorbeat-answerbot • Dataset o Sentiment : Search it or Mine it o 5Star - https://www.yelp.com/dataset_challenge/dataset • Python/Jupyter Notebook o Sentiment: • make-sentiment-model.py • PositiveNegative.ipynb o 5Star – make-5star-model.py • make-5star-model.py • 5StarReviews.ipynb o Prediction – PredictNow.py
  • 27. Thank you so much