SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
NLP & Machine Learning
Vijay Ganti
About Me
• I am an amateur programmer and ML enthusiast
• I am developing NLP prototype systems for problems
that I find interesting and have used models like Naive
Bayes, LDA for topic modeling of HTML data.
• I code in Python
• I have developed a deep love for solving stimulating
problems and since I also like writing I am intrigued by
the problem of “can good/great writing be detected or
one day created by ML/AI”
• An amateur is someone who does something for love
–Laozi (604 BC- 531BC) - A contemporary of
Confucius
“A journey of a thousand miles
begins with a single step”
Agenda
Why NLP & ML?
What is NLP?
Getting started with NLP & ML
Why Python?
Making it real with an NLP & ML coding demo
A program that predicts gender given name(s) as input
Some glimpses into some practical issues
Next Steps
NLP powered by ML is ripe for
changing the way business gets done !
• Conversational agents are becoming an important form
of human-computer communication (Customer
support interactions using chat-bots)
• Much of human-human communication is now
mediated by computers (Email, Social Media,
Messaging)
• An enormous amount of knowledge is now available in
machine readable form as natural language text (web,
proprietary enterprise content)
My meet up
calendar is
buzzing with
NLP & ML
So is
their’s
and his
and all these folks
So what is NLP ?
Get machines to understand human language
Segmentation (words, sentences, stemming)
Part of speech tagging
Named Entity Recognition
Disambiguation (Semantics and Context)
Document/Text Classification like topic modeling……
Disambiguation in language is easy for us
but hard for machines
Sentence Relation
I ate spaghetti with meatballs ingredient
I ate spaghetti with salad side dish
I ate spaghetti with abandon feeling
I ate spaghetti with a fork instrument
I ate spaghetti with a friend company
A few years back we faced the disambiguation problem
with images. This was one time I wanted polarization
and but the machines couldn't tell the difference !
Old vs New NLP
Rule Based
Deterministic
Hard Boundaries
Fixed
Machine Learning Based
Probabilistic
Soft boundaries
Malleable
What do you need to become good at
NLP & ML based on experience & ?
Pick Machine Learning & Distributed Computing stuff, as
needed
ref: https://www.linkedin.com/pulse/20141114072915-11846569-what-it-takes-to-be-a-data-scientist-advice-from-a-
non-data-scientist?trk=mp-reader-card
• Coding 

• Probability Theory & Statistical Inference Theory 

• Algorithm theory for both tweaking models and build
scalable implementations

• Look for problems to solve end-to-end and soak in
large amounts of data (data are everywhere)
Why should I study probability .. we have
all tossed coins and played card games!
Outcomes are highly non-intuitive
Required to combat our primitive intuition &
build sophisticated “intuition”
EXAMPLES ?
Google “Birthday Problem” to see an
example
Why Python for NLP & ML
Easy to get productive quickly
Easy to access and “pre-process” text data
Interpreted so great for research productivity
Support for higher order abstractions and programming
paradigms (declarative/functional, object oriented)
Rich eco-system with tons of modules for data science
and NLP
Getting started with NLP & ML & some
foundational probability theory in Python
• Coursera course on Python Data Structures
• Some basic Python - Google Lectures on Python
(https://developers.google.com/edu/python/)
• NLTK - nltk.org
• Get other packages as needed like NumPy, Matplotlib,
Scikit-learn, PyBrain, pandas, IPython
• Natural Language Processing with Python (book)
• http://norvig.com/ngrams/ch14.pdf
• Azure Text Analytics API ( I haven't tried it but looks
promising)
• http://stats.stackexchange.com/
• https://www.quora.com
Coding time to demonstrate
the ML workflow
Simple gender prediction problem solved
interactively that uses Naive Bayes
Classifier to show the ML workflow &
importance of feature engineering
Supervised classification workflow
Training Data
Feature
Extraction z ML Algo
Prediction
ML Algo
Prediction
Data
Feature
Extraction z
Practical issues seen in our example - Curse of
Dimensionality (too many features isn’t good)
Overfitting (sparse data for some features)
Scaling
More data vs better algorithms
More data is better than better
algorithm
Source - Scaling to Very
Very Large Corpora for
Natural Language
Disambiguation
Michele Banko and Eric
Brill
Microsoft Research
1 Microsoft Way
Redmond, WA 98052
USA
Practical lessons learned so far
Data preparation is 70% of the work
Feature Engineering is 70% of the rest of the work
Domain expertise critical for feature engineering
Modeling is more about understanding the concepts so
that you use it correctly.
It’s hard to understand the theory so don’t try to do this
all at once. Instead pick them as needed and ask for
help.
Next Steps
Think of use cases that will add most value for a customer
Think about the domain deeply not models
Think about the data deeply (acquisition, format,
processing etc.)
Contact me for discussing problems worth solving - we
can hack together or ganti.vijay1@gmail.com
tweet to @vijayganti if you liked the talk and want more
“Ars longa, vita brevis”
which in English is
"Life is short, [the] craft long”
Hippocrates’ Parting Words of Caution
Backup Slides
Naive Bayes Classifier
P (A|B) = P(B|A) x P (A) / P(B)
P (Class| Feature) = P(Feature|Class) x P(Class)/ P(Feature)
Posterior
LikelihoodPriors Evidence
Naive Bayes Classifier
What is independence?
In NLP let’s say you are using word frequency as a feature but
words like
United States
Damn good
Stainless steel
aren’t independent words. They often occur together. Hence you
can get better classification accuracy if your initial processing uses
something called “collocation” to treat them as one unit.

Mais conteúdo relacionado

Mais procurados

Natural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative CommunicationNatural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative Communication
Divya Sugumar
 

Mais procurados (20)

Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Nlp presentation
Nlp presentationNlp presentation
Nlp presentation
 
Chatbot ppt
Chatbot pptChatbot ppt
Chatbot ppt
 
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
 
NLP with Deep Learning
NLP with Deep LearningNLP with Deep Learning
NLP with Deep Learning
 
Blenderbot
BlenderbotBlenderbot
Blenderbot
 
Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 words
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
 
NLP Project Full Cycle
NLP Project Full CycleNLP Project Full Cycle
NLP Project Full Cycle
 
Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games Research
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
Recent Advances in NLP
  Recent Advances in NLP  Recent Advances in NLP
Recent Advances in NLP
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
NLP Project Presentation
NLP Project PresentationNLP Project Presentation
NLP Project Presentation
 
Natural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative CommunicationNatural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative Communication
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLP
 
Plug play language_models
Plug play language_modelsPlug play language_models
Plug play language_models
 

Destaque

Machine Learning for NLP
Machine Learning for NLPMachine Learning for NLP
Machine Learning for NLP
butest
 
Machine Learning Applications in NLP.ppt
Machine Learning Applications in NLP.pptMachine Learning Applications in NLP.ppt
Machine Learning Applications in NLP.ppt
butest
 
Choosing The Right Tool For The Job; How Maastricht University Is Selecting...
Choosing The Right Tool For The Job; How  Maastricht  University Is Selecting...Choosing The Right Tool For The Job; How  Maastricht  University Is Selecting...
Choosing The Right Tool For The Job; How Maastricht University Is Selecting...
Maarten van Wesel
 
Automatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaAutomatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corpora
Traian Rebedea
 
Using Technology To Detect Plagiarism
Using Technology To Detect PlagiarismUsing Technology To Detect Plagiarism
Using Technology To Detect Plagiarism
guestf17a2e
 
Un día hipotético
Un día hipotéticoUn día hipotético
Un día hipotético
Lucipaly
 
Authorship analysis using function words forensic linguistics
Authorship analysis using function words forensic linguisticsAuthorship analysis using function words forensic linguistics
Authorship analysis using function words forensic linguistics
Vlad Mackevic
 
Plagiarism and its detection
Plagiarism and its detectionPlagiarism and its detection
Plagiarism and its detection
ankit_saluja
 

Destaque (20)

Machine Learning for NLP
Machine Learning for NLPMachine Learning for NLP
Machine Learning for NLP
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Machine Learning Applications in NLP.ppt
Machine Learning Applications in NLP.pptMachine Learning Applications in NLP.ppt
Machine Learning Applications in NLP.ppt
 
Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...
Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...
Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Choosing The Right Tool For The Job; How Maastricht University Is Selecting...
Choosing The Right Tool For The Job; How  Maastricht  University Is Selecting...Choosing The Right Tool For The Job; How  Maastricht  University Is Selecting...
Choosing The Right Tool For The Job; How Maastricht University Is Selecting...
 
My Graduation Project Documentation: Plagiarism Detection System for English ...
My Graduation Project Documentation: Plagiarism Detection System for English ...My Graduation Project Documentation: Plagiarism Detection System for English ...
My Graduation Project Documentation: Plagiarism Detection System for English ...
 
Authorship attribution
Authorship attributionAuthorship attribution
Authorship attribution
 
Automatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaAutomatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corpora
 
Plag detection
Plag detectionPlag detection
Plag detection
 
Using Technology To Detect Plagiarism
Using Technology To Detect PlagiarismUsing Technology To Detect Plagiarism
Using Technology To Detect Plagiarism
 
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
 
Un día hipotético
Un día hipotéticoUn día hipotético
Un día hipotético
 
NLP e Chatbots
NLP e ChatbotsNLP e Chatbots
NLP e Chatbots
 
Global Messaging Trends 2 - When are chatbots actually useful?
Global Messaging Trends 2 - When are chatbots actually useful?Global Messaging Trends 2 - When are chatbots actually useful?
Global Messaging Trends 2 - When are chatbots actually useful?
 
Weave-D - 2nd Progress Evaluation Presentation
Weave-D - 2nd Progress Evaluation PresentationWeave-D - 2nd Progress Evaluation Presentation
Weave-D - 2nd Progress Evaluation Presentation
 
The routledge handbook of forensic linguistics routledge handbooks in applied...
The routledge handbook of forensic linguistics routledge handbooks in applied...The routledge handbook of forensic linguistics routledge handbooks in applied...
The routledge handbook of forensic linguistics routledge handbooks in applied...
 
Authorship analysis using function words forensic linguistics
Authorship analysis using function words forensic linguisticsAuthorship analysis using function words forensic linguistics
Authorship analysis using function words forensic linguistics
 
Text to Speech for Mobile Voice
Text to Speech for Mobile Voice Text to Speech for Mobile Voice
Text to Speech for Mobile Voice
 
Plagiarism and its detection
Plagiarism and its detectionPlagiarism and its detection
Plagiarism and its detection
 

Semelhante a NLP & Machine Learning - An Introductory Talk

Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
NadinaLisbon1
 

Semelhante a NLP & Machine Learning - An Introductory Talk (20)

Machine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLMachine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXL
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a profession
 
Machine Learning for Marketers - CTAConf 2019
Machine Learning for Marketers - CTAConf 2019Machine Learning for Marketers - CTAConf 2019
Machine Learning for Marketers - CTAConf 2019
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Starting a career in data science
Starting a career in data scienceStarting a career in data science
Starting a career in data science
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
Brochure data science learning path board-infinity (1)
Brochure   data science learning path board-infinity (1)Brochure   data science learning path board-infinity (1)
Brochure data science learning path board-infinity (1)
 
How I became ML Engineer
How I became ML Engineer How I became ML Engineer
How I became ML Engineer
 
Machine Learning Product Managers Meetup Event
Machine Learning Product Managers Meetup EventMachine Learning Product Managers Meetup Event
Machine Learning Product Managers Meetup Event
 
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
 
How to Use Artificial Intelligence by Microsoft Product Manager
 How to Use Artificial Intelligence by Microsoft Product Manager How to Use Artificial Intelligence by Microsoft Product Manager
How to Use Artificial Intelligence by Microsoft Product Manager
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data career
 
How AI is transforming learning
How AI is transforming learningHow AI is transforming learning
How AI is transforming learning
 
Fact vs. Fiction: How Innovations in AI Will Intersect with Recruitment in th...
Fact vs. Fiction: How Innovations in AI Will Intersect with Recruitment in th...Fact vs. Fiction: How Innovations in AI Will Intersect with Recruitment in th...
Fact vs. Fiction: How Innovations in AI Will Intersect with Recruitment in th...
 
Dato Keynote
Dato KeynoteDato Keynote
Dato Keynote
 
Board Infinity Data Science Brochure - data science learning path
Board Infinity Data Science Brochure -  data science learning pathBoard Infinity Data Science Brochure -  data science learning path
Board Infinity Data Science Brochure - data science learning path
 

Último

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 

Último (20)

Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 

NLP & Machine Learning - An Introductory Talk

  • 1. NLP & Machine Learning Vijay Ganti
  • 2. About Me • I am an amateur programmer and ML enthusiast • I am developing NLP prototype systems for problems that I find interesting and have used models like Naive Bayes, LDA for topic modeling of HTML data. • I code in Python • I have developed a deep love for solving stimulating problems and since I also like writing I am intrigued by the problem of “can good/great writing be detected or one day created by ML/AI” • An amateur is someone who does something for love
  • 3. –Laozi (604 BC- 531BC) - A contemporary of Confucius “A journey of a thousand miles begins with a single step”
  • 4. Agenda Why NLP & ML? What is NLP? Getting started with NLP & ML Why Python? Making it real with an NLP & ML coding demo A program that predicts gender given name(s) as input Some glimpses into some practical issues Next Steps
  • 5. NLP powered by ML is ripe for changing the way business gets done ! • Conversational agents are becoming an important form of human-computer communication (Customer support interactions using chat-bots) • Much of human-human communication is now mediated by computers (Email, Social Media, Messaging) • An enormous amount of knowledge is now available in machine readable form as natural language text (web, proprietary enterprise content)
  • 6. My meet up calendar is buzzing with NLP & ML
  • 10. So what is NLP ? Get machines to understand human language Segmentation (words, sentences, stemming) Part of speech tagging Named Entity Recognition Disambiguation (Semantics and Context) Document/Text Classification like topic modeling……
  • 11. Disambiguation in language is easy for us but hard for machines Sentence Relation I ate spaghetti with meatballs ingredient I ate spaghetti with salad side dish I ate spaghetti with abandon feeling I ate spaghetti with a fork instrument I ate spaghetti with a friend company
  • 12. A few years back we faced the disambiguation problem with images. This was one time I wanted polarization and but the machines couldn't tell the difference !
  • 13. Old vs New NLP Rule Based Deterministic Hard Boundaries Fixed Machine Learning Based Probabilistic Soft boundaries Malleable
  • 14. What do you need to become good at NLP & ML based on experience & ? Pick Machine Learning & Distributed Computing stuff, as needed ref: https://www.linkedin.com/pulse/20141114072915-11846569-what-it-takes-to-be-a-data-scientist-advice-from-a- non-data-scientist?trk=mp-reader-card • Coding • Probability Theory & Statistical Inference Theory • Algorithm theory for both tweaking models and build scalable implementations • Look for problems to solve end-to-end and soak in large amounts of data (data are everywhere)
  • 15. Why should I study probability .. we have all tossed coins and played card games! Outcomes are highly non-intuitive Required to combat our primitive intuition & build sophisticated “intuition” EXAMPLES ? Google “Birthday Problem” to see an example
  • 16. Why Python for NLP & ML Easy to get productive quickly Easy to access and “pre-process” text data Interpreted so great for research productivity Support for higher order abstractions and programming paradigms (declarative/functional, object oriented) Rich eco-system with tons of modules for data science and NLP
  • 17.
  • 18.
  • 19. Getting started with NLP & ML & some foundational probability theory in Python • Coursera course on Python Data Structures • Some basic Python - Google Lectures on Python (https://developers.google.com/edu/python/) • NLTK - nltk.org • Get other packages as needed like NumPy, Matplotlib, Scikit-learn, PyBrain, pandas, IPython • Natural Language Processing with Python (book) • http://norvig.com/ngrams/ch14.pdf • Azure Text Analytics API ( I haven't tried it but looks promising) • http://stats.stackexchange.com/ • https://www.quora.com
  • 20. Coding time to demonstrate the ML workflow Simple gender prediction problem solved interactively that uses Naive Bayes Classifier to show the ML workflow & importance of feature engineering
  • 21. Supervised classification workflow Training Data Feature Extraction z ML Algo Prediction ML Algo Prediction Data Feature Extraction z
  • 22. Practical issues seen in our example - Curse of Dimensionality (too many features isn’t good) Overfitting (sparse data for some features) Scaling More data vs better algorithms
  • 23. More data is better than better algorithm Source - Scaling to Very Very Large Corpora for Natural Language Disambiguation Michele Banko and Eric Brill Microsoft Research 1 Microsoft Way Redmond, WA 98052 USA
  • 24. Practical lessons learned so far Data preparation is 70% of the work Feature Engineering is 70% of the rest of the work Domain expertise critical for feature engineering Modeling is more about understanding the concepts so that you use it correctly. It’s hard to understand the theory so don’t try to do this all at once. Instead pick them as needed and ask for help.
  • 25. Next Steps Think of use cases that will add most value for a customer Think about the domain deeply not models Think about the data deeply (acquisition, format, processing etc.) Contact me for discussing problems worth solving - we can hack together or ganti.vijay1@gmail.com tweet to @vijayganti if you liked the talk and want more
  • 26. “Ars longa, vita brevis” which in English is "Life is short, [the] craft long” Hippocrates’ Parting Words of Caution
  • 28. Naive Bayes Classifier P (A|B) = P(B|A) x P (A) / P(B) P (Class| Feature) = P(Feature|Class) x P(Class)/ P(Feature) Posterior LikelihoodPriors Evidence
  • 29. Naive Bayes Classifier What is independence? In NLP let’s say you are using word frequency as a feature but words like United States Damn good Stainless steel aren’t independent words. They often occur together. Hence you can get better classification accuracy if your initial processing uses something called “collocation” to treat them as one unit.