SlideShare a Scribd company logo
1 of 32
Download to read offline
William McKnight
www.mcknightcg.com
214-514-1444
Natural Language Processing
Strategies
#AdvAnalytics
@williammcknight
Language
• There are 6000+ distinct languages on Earth
• Languages spread and shrink
• English is especially difficult
Graphic credit: Minna Sundberg
Computers are confused by language
• So NLP must incorporate
– Linguistics
– Theoretical Computer Science
– Math
– Statistics
– Artificial Intelligence
– Psychology
Linguistics
• Words have
– Intention (goals, shared knowledge, beliefs)
– Generation
– Synthetization
• Understanding is
– Perception
– Interpretation
– Incorporation
Analyzing Language Data
• Need Text Analysis and Natural Language Processing
• Text Analysis: Text mining or text analytics is the process of
deriving meaningful information from natural language
• Natural language processing refers to the artificial
intelligence methods of communicating better intelligence
using the natural language
Garden Path Sentences
• Don’t bother going.
• Don’t bother going early.
• Meet me at five.
• Meet me at five to four.
• The old man the boat.
• The prime number few.
• The man whistling tunes pianos.
• The complex houses married and single soldiers and their
families.
Consider the News
(CNN) — Researchers in Canada have released new images of a
remarkably well-preserved shipwreck that will shed new light on the ill-
fated 1845 Arctic expedition in which famed British explorer John
Franklin died.
The wreck of HMS Terror has effectively been "frozen in time" thanks to
the cold, deep waters of Terror Bay in Nunavut, Canada, and a layer of
silt which has preserved artifacts such as maps, logs and scientific
instruments, according to a study by Parks Canada in conjunction with
Inuit researchers.
HMS Terror and HMS Erebus set off from England in 1845 in search of a
route across the North-West Passage but got stuck in sea ice, forcing the
129 crew members to abandon ship in 1848. The men died one by one
attempting to walk to safety across the Arctic.
Where does text come from?
• Internet chat, blogs, reviews, wikis, scientific papers,
medical records, books
– All present specific challenges
Natural Language Processing is the study of the
computational treatment of natural language
NLP: 2 Sides
• Understanding
– Mapping the given input in natural language into useful
representations
– Analyzing different aspects of the language
• Generation
– Text planning − Retrieving the relevant content from knowledge
base
– Sentence planning − Choosing required words, forming
meaningful phrases, setting tone of the sentence
– Text Realization − Mapping sentence plan into sentence structure
Enterprise Applications of NLP 1/3
– Querying Image Content
– Customer Service and Marketing Virtual Digital
Assistants
– Patent Research and Analysis
– Automated Report Generation
– Patient Data Processing
– Converting Paperwork into Digital Data
– Automated Code Development
– Contract Analysis
– Automated CliffsNotes, Study Notes, and Quiz
Generation
– Intelligent Recruitment and Human Resources
Systems
– Sentiment Analysis
– Healthcare Virtual Digital Assistants
– Sentiment Analysis for Psychoanalysis
– Business Application Virtual Digital Assistants
– E-Commerce and Sales Virtual Digital Assistants
– Banking and Financial Services
– Automating Food and Beverage Ordering
– Social Media Feed Curation
– Language Translation Services
– Predictive Typing Assistant
– Education for Autistic and Speech Deficient Children
– Automated Grading
– Text Classification and Mining for Biomedical
Literature
– Mining, Processing, and Making Sense of Clinical
Notes
– Film Script Analysis
– Dialect Classification
– Hospital Patient Management System
– Real-Time News Analysis and Competitive
Intelligence
– Automated Tour Guide and Itinerary Service
Enterprise Applications of NLP 2/3
• Customer service
– NLP technologies today are smart enough to transcribe and analyze
the massive recorded call data that enterprise databases contain.
– The most prominent applications of NLP are in customer support.
• Reputation management
– Social media platforms have become important
– Consumers actively participate in reviewing their brand experiences
and posting interactions with businesses
– Analyze content across social media platforms and tell you the
sentiment being conveyed about your brand — positive, negative, or
neutral.
– Provide real-time updates available in dashboards
Enterprise Applications of NLP 3/3
• Personalized Advertising
– Traditionally, enterprises have relied upon demographics and psychographic
variables to segment their markets for targeted advertising
– Search engine browsing and social media activity
– Identifying patterns in unstructured data spread across several web platforms
– Segment users into highly nuanced groups, called personas
• Market and Product intelligence
– “Event extraction” is an NLP technique that parses information to mine information
about specific events
– Mergers and acquisitions, key takeovers, changes in the board of directors, key job
role changes — any kind of event can be identified by an NLP algorithm
– This can create a structured database of event information about companies, which
is invaluable for an enterprise
NLP is Getting Better
• Translation Accuracy
• Self-Supervised Pre-Training
• Speech recognition
• Natural language interpretation
• Machine translation
• Sentiment analysis
Widely used NLP Benchmarks
• GLUE
• RACE
• SuperGlue
Steps in NLP
• Tokenization
• Stemming
• Lemmatization
• Part of Speech Tagging
• Named Entity Recognition
• Chunking
Tokenization
• The process of segmenting running text into words and
sentences.
• Text needs to be segmented into linguistic units such as words,
punctuation, numbers, alphanumeric, etc.
• In English, words are often separated from each other by blanks
(white space), but not all white space is equal.
• Tokenization is an identification of basic units to be processed.
• The identification of units that do not need to be further
decomposed for subsequent processing is an extremely
important one.
Steps in Tokenization
• Segmenting Text into Words
• Handling Abbreviations
• Handling Hyphenated Words
• Numerical and special expressions
Stemming and Lemmatization
• The goal of both stemming and lemmatization is to reduce
inflectional forms and sometimes derivationally related forms of
a word to a common base form
• Stemming refers to a crude heuristic process that chops off the
ends of words in the hope of achieving this goal correctly most
of the time
• Lemmatization refers to doing things properly with the use of a
vocabulary and morphological analysis of words, normally
aiming to remove inflectional endings only and to return the
base or dictionary form of a word, which is known as the lemma
Part of Speech Tagging
• Part-of-speech tagging (POS tagging) is the task of tagging
a word in a text with its part of speech.
• A part of speech is a category of words with similar
grammatical properties.
• Common English parts of speech are noun, verb, adjective,
adverb, pronoun, preposition, conjunction, etc.
POS Tagging
• The runner is preparing to start his last race.
• Start = verb or noun?
• Last = noun or adjective?
• Race = verb or noun?
Named Entity Recognition
• Named Entity Recognition is a process where an algorithm
takes a string of text (sentence or paragraph) as input and
identifies relevant nouns (people, places, and organizations)
that are mentioned in that string.
• Named Entity Recognition can automatically scan entire
articles, twitter, research, etc. and reveal which are the
major people, organizations, and places discussed in them.
Parse Tree
Datascience.stackexchange.com
Named Entity Recognition Output
Chunking
• Chunking is also called shallow parsing or hierarchy of ideas
• Chunking is a process of extracting phrases from
unstructured text
Chunking Challenge Examples
• Joe ate chicken with waffles.
• Joe ate chicken with Mary.
• Joe ate chicken with a knife.
• Joe ate chicken with fear.
NLP Open Source Libraries
• spaCy
• Textacy
• Neuralcoref
Build or Buy NLP
• Building your own NLP system from the ground up:
– Need engineer with NLP skills + other developers
– Cost: $x00,000+
– Time: months to years
– Usefulness: limited without major additional work
• Working with an experienced NLP vendor:
– Cost: $x0,000 (basic text analytics and visualization) to low
$x00,000+ (semi-custom NLP application)
– Time: weeks to months
– Usefulness: customized to your specific needs
NLP Vendors
• AppZen
• Automated Insights
• Cogito
• Lexalytics
• Luminoso
• M*Modal
• SmartLogic
• SyTrue
• Woebot
• Does not fit neatly into tabular relational databases
• The most common use case for the data is agile data
discovery across an enterprise
– Text Analytics/NLP
• Look for
– Search capabilities
– Data management – quick ingest, no modeling required, secure
connections, easy self-service mashups, query operations
– Deployment options
Data for NLP
NLP …
• Reduces the gap between human
and machine communication
• Automates processes and creates
operational efficiency
• Pushes the barriers of data analysis
by bringing unstructured data into
play
• Extends the capability of existing
business intelligence assets in the
enterprise
Second Thursday of
Every Month, at 2:00 ET
Presented by: William McKnight
President, McKnight Consulting Group
www.mcknightcg.com (214) 514-1444
#AdvAnalytics

More Related Content

More from DATAVERSITY

Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...DATAVERSITY
 
Empowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceEmpowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceDATAVERSITY
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Data Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsData Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsDATAVERSITY
 
Including All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and AnalyticsIncluding All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and AnalyticsDATAVERSITY
 
Assessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelAssessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelDATAVERSITY
 
What’s in Your Data Warehouse?
What’s in Your Data Warehouse?What’s in Your Data Warehouse?
What’s in Your Data Warehouse?DATAVERSITY
 
Achieving a Single View of Business – Critical Data with Master Data Management
Achieving a Single View of Business – Critical Data with Master Data ManagementAchieving a Single View of Business – Critical Data with Master Data Management
Achieving a Single View of Business – Critical Data with Master Data ManagementDATAVERSITY
 

More from DATAVERSITY (20)

Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
 
Empowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business IntelligenceEmpowering the Data Driven Business with Modern Business Intelligence
Empowering the Data Driven Business with Modern Business Intelligence
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Data Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and RoadmapsData Governance Best Practices, Assessments, and Roadmaps
Data Governance Best Practices, Assessments, and Roadmaps
 
Including All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and AnalyticsIncluding All Your Mission-Critical Data in Modern Apps and Analytics
Including All Your Mission-Critical Data in Modern Apps and Analytics
 
Assessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelAssessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-Model
 
What’s in Your Data Warehouse?
What’s in Your Data Warehouse?What’s in Your Data Warehouse?
What’s in Your Data Warehouse?
 
Achieving a Single View of Business – Critical Data with Master Data Management
Achieving a Single View of Business – Critical Data with Master Data ManagementAchieving a Single View of Business – Critical Data with Master Data Management
Achieving a Single View of Business – Critical Data with Master Data Management
 

Recently uploaded

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 

ADV Slides: Natural Language Processing Strategies

  • 1. William McKnight www.mcknightcg.com 214-514-1444 Natural Language Processing Strategies #AdvAnalytics @williammcknight
  • 2. Language • There are 6000+ distinct languages on Earth • Languages spread and shrink • English is especially difficult Graphic credit: Minna Sundberg
  • 3. Computers are confused by language • So NLP must incorporate – Linguistics – Theoretical Computer Science – Math – Statistics – Artificial Intelligence – Psychology
  • 4. Linguistics • Words have – Intention (goals, shared knowledge, beliefs) – Generation – Synthetization • Understanding is – Perception – Interpretation – Incorporation
  • 5. Analyzing Language Data • Need Text Analysis and Natural Language Processing • Text Analysis: Text mining or text analytics is the process of deriving meaningful information from natural language • Natural language processing refers to the artificial intelligence methods of communicating better intelligence using the natural language
  • 6. Garden Path Sentences • Don’t bother going. • Don’t bother going early. • Meet me at five. • Meet me at five to four. • The old man the boat. • The prime number few. • The man whistling tunes pianos. • The complex houses married and single soldiers and their families.
  • 7. Consider the News (CNN) — Researchers in Canada have released new images of a remarkably well-preserved shipwreck that will shed new light on the ill- fated 1845 Arctic expedition in which famed British explorer John Franklin died. The wreck of HMS Terror has effectively been "frozen in time" thanks to the cold, deep waters of Terror Bay in Nunavut, Canada, and a layer of silt which has preserved artifacts such as maps, logs and scientific instruments, according to a study by Parks Canada in conjunction with Inuit researchers. HMS Terror and HMS Erebus set off from England in 1845 in search of a route across the North-West Passage but got stuck in sea ice, forcing the 129 crew members to abandon ship in 1848. The men died one by one attempting to walk to safety across the Arctic.
  • 8. Where does text come from? • Internet chat, blogs, reviews, wikis, scientific papers, medical records, books – All present specific challenges
  • 9. Natural Language Processing is the study of the computational treatment of natural language
  • 10. NLP: 2 Sides • Understanding – Mapping the given input in natural language into useful representations – Analyzing different aspects of the language • Generation – Text planning − Retrieving the relevant content from knowledge base – Sentence planning − Choosing required words, forming meaningful phrases, setting tone of the sentence – Text Realization − Mapping sentence plan into sentence structure
  • 11. Enterprise Applications of NLP 1/3 – Querying Image Content – Customer Service and Marketing Virtual Digital Assistants – Patent Research and Analysis – Automated Report Generation – Patient Data Processing – Converting Paperwork into Digital Data – Automated Code Development – Contract Analysis – Automated CliffsNotes, Study Notes, and Quiz Generation – Intelligent Recruitment and Human Resources Systems – Sentiment Analysis – Healthcare Virtual Digital Assistants – Sentiment Analysis for Psychoanalysis – Business Application Virtual Digital Assistants – E-Commerce and Sales Virtual Digital Assistants – Banking and Financial Services – Automating Food and Beverage Ordering – Social Media Feed Curation – Language Translation Services – Predictive Typing Assistant – Education for Autistic and Speech Deficient Children – Automated Grading – Text Classification and Mining for Biomedical Literature – Mining, Processing, and Making Sense of Clinical Notes – Film Script Analysis – Dialect Classification – Hospital Patient Management System – Real-Time News Analysis and Competitive Intelligence – Automated Tour Guide and Itinerary Service
  • 12. Enterprise Applications of NLP 2/3 • Customer service – NLP technologies today are smart enough to transcribe and analyze the massive recorded call data that enterprise databases contain. – The most prominent applications of NLP are in customer support. • Reputation management – Social media platforms have become important – Consumers actively participate in reviewing their brand experiences and posting interactions with businesses – Analyze content across social media platforms and tell you the sentiment being conveyed about your brand — positive, negative, or neutral. – Provide real-time updates available in dashboards
  • 13. Enterprise Applications of NLP 3/3 • Personalized Advertising – Traditionally, enterprises have relied upon demographics and psychographic variables to segment their markets for targeted advertising – Search engine browsing and social media activity – Identifying patterns in unstructured data spread across several web platforms – Segment users into highly nuanced groups, called personas • Market and Product intelligence – “Event extraction” is an NLP technique that parses information to mine information about specific events – Mergers and acquisitions, key takeovers, changes in the board of directors, key job role changes — any kind of event can be identified by an NLP algorithm – This can create a structured database of event information about companies, which is invaluable for an enterprise
  • 14. NLP is Getting Better • Translation Accuracy • Self-Supervised Pre-Training • Speech recognition • Natural language interpretation • Machine translation • Sentiment analysis
  • 15. Widely used NLP Benchmarks • GLUE • RACE • SuperGlue
  • 16. Steps in NLP • Tokenization • Stemming • Lemmatization • Part of Speech Tagging • Named Entity Recognition • Chunking
  • 17. Tokenization • The process of segmenting running text into words and sentences. • Text needs to be segmented into linguistic units such as words, punctuation, numbers, alphanumeric, etc. • In English, words are often separated from each other by blanks (white space), but not all white space is equal. • Tokenization is an identification of basic units to be processed. • The identification of units that do not need to be further decomposed for subsequent processing is an extremely important one.
  • 18. Steps in Tokenization • Segmenting Text into Words • Handling Abbreviations • Handling Hyphenated Words • Numerical and special expressions
  • 19. Stemming and Lemmatization • The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form • Stemming refers to a crude heuristic process that chops off the ends of words in the hope of achieving this goal correctly most of the time • Lemmatization refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma
  • 20. Part of Speech Tagging • Part-of-speech tagging (POS tagging) is the task of tagging a word in a text with its part of speech. • A part of speech is a category of words with similar grammatical properties. • Common English parts of speech are noun, verb, adjective, adverb, pronoun, preposition, conjunction, etc.
  • 21. POS Tagging • The runner is preparing to start his last race. • Start = verb or noun? • Last = noun or adjective? • Race = verb or noun?
  • 22. Named Entity Recognition • Named Entity Recognition is a process where an algorithm takes a string of text (sentence or paragraph) as input and identifies relevant nouns (people, places, and organizations) that are mentioned in that string. • Named Entity Recognition can automatically scan entire articles, twitter, research, etc. and reveal which are the major people, organizations, and places discussed in them.
  • 25. Chunking • Chunking is also called shallow parsing or hierarchy of ideas • Chunking is a process of extracting phrases from unstructured text
  • 26. Chunking Challenge Examples • Joe ate chicken with waffles. • Joe ate chicken with Mary. • Joe ate chicken with a knife. • Joe ate chicken with fear.
  • 27. NLP Open Source Libraries • spaCy • Textacy • Neuralcoref
  • 28. Build or Buy NLP • Building your own NLP system from the ground up: – Need engineer with NLP skills + other developers – Cost: $x00,000+ – Time: months to years – Usefulness: limited without major additional work • Working with an experienced NLP vendor: – Cost: $x0,000 (basic text analytics and visualization) to low $x00,000+ (semi-custom NLP application) – Time: weeks to months – Usefulness: customized to your specific needs
  • 29. NLP Vendors • AppZen • Automated Insights • Cogito • Lexalytics • Luminoso • M*Modal • SmartLogic • SyTrue • Woebot
  • 30. • Does not fit neatly into tabular relational databases • The most common use case for the data is agile data discovery across an enterprise – Text Analytics/NLP • Look for – Search capabilities – Data management – quick ingest, no modeling required, secure connections, easy self-service mashups, query operations – Deployment options Data for NLP
  • 31. NLP … • Reduces the gap between human and machine communication • Automates processes and creates operational efficiency • Pushes the barriers of data analysis by bringing unstructured data into play • Extends the capability of existing business intelligence assets in the enterprise
  • 32. Second Thursday of Every Month, at 2:00 ET Presented by: William McKnight President, McKnight Consulting Group www.mcknightcg.com (214) 514-1444 #AdvAnalytics