In this webinar, we will introduce natural language processing (NLP) to the data professional who has a use case for NLP or would like to fit NLP into their environment. NLP is about the deep understanding of human language communication. This webinar is an introduction to the capabilities of NLP, an approach that is utilized across many different use cases, including computer-assisted coding, speech recognition, and machine translation.
This webinar introduces the concept of natural language processing, the steps in NLP, the challenges of NLP, and the core algorithms of word vectors, recurrent and recursive neural networks, and convolutional neural networks.
2. Language
• There are 6000+ distinct languages on Earth
• Languages spread and shrink
• English is especially difficult
Graphic credit: Minna Sundberg
3. Computers are confused by language
• So NLP must incorporate
– Linguistics
– Theoretical Computer Science
– Math
– Statistics
– Artificial Intelligence
– Psychology
4. Linguistics
• Words have
– Intention (goals, shared knowledge, beliefs)
– Generation
– Synthetization
• Understanding is
– Perception
– Interpretation
– Incorporation
5. Analyzing Language Data
• Need Text Analysis and Natural Language Processing
• Text Analysis: Text mining or text analytics is the process of
deriving meaningful information from natural language
• Natural language processing refers to the artificial
intelligence methods of communicating better intelligence
using the natural language
6. Garden Path Sentences
• Don’t bother going.
• Don’t bother going early.
• Meet me at five.
• Meet me at five to four.
• The old man the boat.
• The prime number few.
• The man whistling tunes pianos.
• The complex houses married and single soldiers and their
families.
7. Consider the News
(CNN) — Researchers in Canada have released new images of a
remarkably well-preserved shipwreck that will shed new light on the ill-
fated 1845 Arctic expedition in which famed British explorer John
Franklin died.
The wreck of HMS Terror has effectively been "frozen in time" thanks to
the cold, deep waters of Terror Bay in Nunavut, Canada, and a layer of
silt which has preserved artifacts such as maps, logs and scientific
instruments, according to a study by Parks Canada in conjunction with
Inuit researchers.
HMS Terror and HMS Erebus set off from England in 1845 in search of a
route across the North-West Passage but got stuck in sea ice, forcing the
129 crew members to abandon ship in 1848. The men died one by one
attempting to walk to safety across the Arctic.
8. Where does text come from?
• Internet chat, blogs, reviews, wikis, scientific papers,
medical records, books
– All present specific challenges
10. NLP: 2 Sides
• Understanding
– Mapping the given input in natural language into useful
representations
– Analyzing different aspects of the language
• Generation
– Text planning − Retrieving the relevant content from knowledge
base
– Sentence planning − Choosing required words, forming
meaningful phrases, setting tone of the sentence
– Text Realization − Mapping sentence plan into sentence structure
11. Enterprise Applications of NLP 1/3
– Querying Image Content
– Customer Service and Marketing Virtual Digital
Assistants
– Patent Research and Analysis
– Automated Report Generation
– Patient Data Processing
– Converting Paperwork into Digital Data
– Automated Code Development
– Contract Analysis
– Automated CliffsNotes, Study Notes, and Quiz
Generation
– Intelligent Recruitment and Human Resources
Systems
– Sentiment Analysis
– Healthcare Virtual Digital Assistants
– Sentiment Analysis for Psychoanalysis
– Business Application Virtual Digital Assistants
– E-Commerce and Sales Virtual Digital Assistants
– Banking and Financial Services
– Automating Food and Beverage Ordering
– Social Media Feed Curation
– Language Translation Services
– Predictive Typing Assistant
– Education for Autistic and Speech Deficient Children
– Automated Grading
– Text Classification and Mining for Biomedical
Literature
– Mining, Processing, and Making Sense of Clinical
Notes
– Film Script Analysis
– Dialect Classification
– Hospital Patient Management System
– Real-Time News Analysis and Competitive
Intelligence
– Automated Tour Guide and Itinerary Service
12. Enterprise Applications of NLP 2/3
• Customer service
– NLP technologies today are smart enough to transcribe and analyze
the massive recorded call data that enterprise databases contain.
– The most prominent applications of NLP are in customer support.
• Reputation management
– Social media platforms have become important
– Consumers actively participate in reviewing their brand experiences
and posting interactions with businesses
– Analyze content across social media platforms and tell you the
sentiment being conveyed about your brand — positive, negative, or
neutral.
– Provide real-time updates available in dashboards
13. Enterprise Applications of NLP 3/3
• Personalized Advertising
– Traditionally, enterprises have relied upon demographics and psychographic
variables to segment their markets for targeted advertising
– Search engine browsing and social media activity
– Identifying patterns in unstructured data spread across several web platforms
– Segment users into highly nuanced groups, called personas
• Market and Product intelligence
– “Event extraction” is an NLP technique that parses information to mine information
about specific events
– Mergers and acquisitions, key takeovers, changes in the board of directors, key job
role changes — any kind of event can be identified by an NLP algorithm
– This can create a structured database of event information about companies, which
is invaluable for an enterprise
16. Steps in NLP
• Tokenization
• Stemming
• Lemmatization
• Part of Speech Tagging
• Named Entity Recognition
• Chunking
17. Tokenization
• The process of segmenting running text into words and
sentences.
• Text needs to be segmented into linguistic units such as words,
punctuation, numbers, alphanumeric, etc.
• In English, words are often separated from each other by blanks
(white space), but not all white space is equal.
• Tokenization is an identification of basic units to be processed.
• The identification of units that do not need to be further
decomposed for subsequent processing is an extremely
important one.
18. Steps in Tokenization
• Segmenting Text into Words
• Handling Abbreviations
• Handling Hyphenated Words
• Numerical and special expressions
19. Stemming and Lemmatization
• The goal of both stemming and lemmatization is to reduce
inflectional forms and sometimes derivationally related forms of
a word to a common base form
• Stemming refers to a crude heuristic process that chops off the
ends of words in the hope of achieving this goal correctly most
of the time
• Lemmatization refers to doing things properly with the use of a
vocabulary and morphological analysis of words, normally
aiming to remove inflectional endings only and to return the
base or dictionary form of a word, which is known as the lemma
20. Part of Speech Tagging
• Part-of-speech tagging (POS tagging) is the task of tagging
a word in a text with its part of speech.
• A part of speech is a category of words with similar
grammatical properties.
• Common English parts of speech are noun, verb, adjective,
adverb, pronoun, preposition, conjunction, etc.
21. POS Tagging
• The runner is preparing to start his last race.
• Start = verb or noun?
• Last = noun or adjective?
• Race = verb or noun?
22. Named Entity Recognition
• Named Entity Recognition is a process where an algorithm
takes a string of text (sentence or paragraph) as input and
identifies relevant nouns (people, places, and organizations)
that are mentioned in that string.
• Named Entity Recognition can automatically scan entire
articles, twitter, research, etc. and reveal which are the
major people, organizations, and places discussed in them.
25. Chunking
• Chunking is also called shallow parsing or hierarchy of ideas
• Chunking is a process of extracting phrases from
unstructured text
26. Chunking Challenge Examples
• Joe ate chicken with waffles.
• Joe ate chicken with Mary.
• Joe ate chicken with a knife.
• Joe ate chicken with fear.
28. Build or Buy NLP
• Building your own NLP system from the ground up:
– Need engineer with NLP skills + other developers
– Cost: $x00,000+
– Time: months to years
– Usefulness: limited without major additional work
• Working with an experienced NLP vendor:
– Cost: $x0,000 (basic text analytics and visualization) to low
$x00,000+ (semi-custom NLP application)
– Time: weeks to months
– Usefulness: customized to your specific needs
30. • Does not fit neatly into tabular relational databases
• The most common use case for the data is agile data
discovery across an enterprise
– Text Analytics/NLP
• Look for
– Search capabilities
– Data management – quick ingest, no modeling required, secure
connections, easy self-service mashups, query operations
– Deployment options
Data for NLP
31. NLP …
• Reduces the gap between human
and machine communication
• Automates processes and creates
operational efficiency
• Pushes the barriers of data analysis
by bringing unstructured data into
play
• Extends the capability of existing
business intelligence assets in the
enterprise
32. Second Thursday of
Every Month, at 2:00 ET
Presented by: William McKnight
President, McKnight Consulting Group
www.mcknightcg.com (214) 514-1444
#AdvAnalytics