SlideShare a Scribd company logo
1 of 52
Download to read offline
RUDOLF EREMYAN
MACHINE LEARNING SOFTWARE ENGINEER
INTRODUCTION TO NATURAL LANGUAGE
PROCESSING
CONTACTS: EREMYAN.RUDOLF@GMAIL.COM HTTPS://WWW.LINKEDIN.COM/IN/RUDOLFEREMYAN/
CHATBOT FRAMEWORK FOR GEORGIAN
LANGUAGE
TI BOT FOR TBC
BANK
• 35K LIKES
• 100K CONVERSATIONS
• 8K ACTIVE USERS PER MONTH
• 41,5K USERS ASKES ABOUT
WEATHER
• 1K P2P TRANSACTIONS IN
AUGUST
SENTIMENT ANALYSIS ON FACEBOOK
COMMENTS
NATURAL LANGUAGE PROCESSING
https://en.wikipedia.org/wiki/Natural_language_processing
NATURAL LANGUAGE PROCESSING (NLP) IS A FIELD
OF COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE AND
COMPUTATIONAL LINGUISTICS CONCERNED WITH THE
INTERACTIONS BETWEEN COMPUTERS AND HUMAN
(NATURAL) LANGUAGES, AND, IN PARTICULAR,
CONCERNED WITH PROGRAMMING COMPUTERS TO
FRUITFULLY PROCESS LARGE NATURAL LANGUAGE
CORPORA.
THE HISTORY OF NLP
https://en.wikipedia.org/wiki/Natural_language_processing
1950 - ALAN TURING PUBLISHED
AN ARTICLE TITLED "COMPUTING
MACHINERY AND
INTELLIGENCE" WHICH
PROPOSED WHAT IS NOW
CALLED THE TURING TEST AS A
CRITERION OF INTELLIGENCE.
THE HISTORY OF NLP
https://en.wikipedia.org/wiki/Natural_language_processing
1954 - THE GEORGETOWN
EXPERIMENT INVOLVED FULLY
AUTOMATIC TRANSLATION OF
MORE THAN SIXTY RUSSIAN
SENTENCES INTO ENGLISH. THE
AUTHORS CLAIMED THAT WITHIN
THREE OR FIVE YEARS, MACHINE
TRANSLATION WOULD BE A SOLVED
PROBLEM.
THE HISTORY OF NLP
https://en.wikipedia.org/wiki/Natural_language_processing
1970 - MANY PROGRAMMERS BEGAN TO WRITE "CONCEPTUAL ONTOLOGIES", WHICH STRUCTURED REAL-
WORLD INFORMATION INTO COMPUTER-UNDERSTANDABLE DATA. EXAMPLES ARE QUALM (LEHNERT, 1977),
POLITICS (CARBONELL, 1979), AND PLOT UNITS (LEHNERT 1981). DURING THIS TIME, MANY CHATTERBOTS
WERE WRITTEN INCLUDING PARRY, RACTER.
• WORDNET
• EUROWORDNET
• SENTIWORDNET
THE HISTORY OF NLP
https://en.wikipedia.org/wiki/Natural_language_processing
1980 - THERE WAS A REVOLUTION IN NLP WITH
THE INTRODUCTION OF MACHINE LEARNING
ALGORITHMS FOR LANGUAGE PROCESSING. PART-
OF-SPEECH TAGGING INTRODUCED THE USE OF
HIDDEN MARKOV MODELS TO NLP, AND
INCREASINGLY, RESEARCH HAS FOCUSED ON
STATISTICAL MODELS, WHICH MAKE SOFT,
PROBABILISTIC DECISIONS BASED ON ATTACHING
REAL-VALUED WEIGHTS TO THE FEATURES MAKING
UP THE INPUT DATA.
THE HISTORY OF NLP
https://en.wikipedia.org/wiki/Natural_language_processing
IN RECENT YEARS, THERE HAS BEEN A FLURRY OF RESULTS SHOWING DEEP
LEARNING TECHNIQUES ACHIEVING STATE-OF-THE-ART RESULTS IN MANY
NATURAL LANGUAGE TASKS, FOR EXAMPLE IN LANGUAGE MODELING,
PARSING AND MANY OTHERS.
HAVE YOU EVER USED ANY NLP PRODUCTS?
HAVE YOU EVER USED ANY NLP PRODUCTS?
NLP APPLICATIONS
TEXT CLASSIFICATION
TEXT CLUSTERING
TEXT SUMMARISATION
MACHINE TRANSLATION
SEMANTIC SEARCH
SENTIMENT ANALYSIS
QUESTION ANSWERING
INFORMATION EXTRACTION
NLP. TEXT CLASSIFICATION
Document classification or
document categorization is a
problem in library science,
information science and computer
science. The task is to assign a
document to one or more classes or
categories. This may be done
"manually" or algorithmically.
Popular algorithms:
1. Multinomial Naive Bayes
2. SVM
3. Neural Networks
NLP. TEXT CLUSTERING
Document clustering (or text
clustering) is the application of
cluster analysis to textual
documents. It has applications in
automatic document organization,
topic extraction and fast information
retrieval or filtering.
Popular algorithms:
1. k-Means
2. DBSCAN
3. Deep Learning
NLP. TEXT SUMMARISATION
Automatic summarization is the
process of shortening a text
document with software, in order
to create a summary with the
major points of the original
document. Technologies that
can make a coherent summary
take into account variables such
as length, writing style and
syntax.
Popular algorithms:
1. LDA
2. Deep Learning
NLP. MACHINE TRANSLATION
MT performs simple substitution of words in
one language for words in another, but that
alone usually cannot produce a good
translation of a text because recognition of
whole phrases and their closest counterparts
in the target language is needed. Solving this
problem with corpus statistical, and neural
techniques is a rapidly growing field that is
leading to better translations, handling
differences in linguistic typology, translation of
idioms, and the isolation of anomalies
Algorithms:
1. Rule based
2. Statistical methods
3. Encoder-Decoder
NLP. SEMANTIC SEARCH
Semantic search seeks to
improve search accuracy by
understanding searcher intent
and the contextual meaning of
terms as they appear in the
searchable dataspace, whether
on the Web or within a closed
system, to generate more
relevant results.
Approaches:
1. Entity Recognition
2. User context
NLP. SENTIMENT ANALYSIS
Sentiment Analysis is the
process of determining whether a
piece of writing is positive,
negative or neutral. It's also
known as opinion mining,
deriving the opinion or attitude of
a speaker.
Algorithms:
1. Lexicon-based
2. Machine Learning (SVM)
3. Deep Learning (RNN, LSTM)
NLP. QUESTION ANSWERING
Question answering (QA) is a
computer science discipline within
the fields of information retrieval and
natural language processing (NLP),
which is concerned with building
systems that automatically answer
questions posed by humans in a
natural language.
Algorithms:
1. Rule based
2. Machine Learning
3. Deep Learning
NLP. INFORMATION EXTRACTION
Information extraction is the task of automatically
extracting structured information from unstructured
and/or semi-structured machine-readable documents.
NLP TOOLS
1. MORPHOLOGICAL ANALYZER
2. POS TAGGER
3. STEMMER
4. PARSERS
5. NAMED ENTITY RECOGNIZER
NLP. STEMMER
Stemmers remove morphological affixes from words, leaving only the word stem.
bananas -> banana
flies -> fli
cats -> cat
dogs -> dog
How about “flies” -> fly?
NLP. MORPHOLOGICAL ANALYZER
Lemmatization usually refers to doing things properly
with the use of a vocabulary and morphological
analysis of words, normally aiming to remove
inflectional endings only and to return the base or
dictionary form of a word, which is known as the
lemma .
flies -> fly
went -> go
am, are, is -> be
NLP. MORPHOLOGICAL ANALYZER
NLP. POS TAGGER
A Part-Of-Speech Tagger (POS Tagger) is a piece of
software that reads text in some language and
assigns parts of speech to each word (and other
token), such as noun, verb, adjective, etc., although
generally computational applications use more fine-
grained POS tags like 'noun-plural'.
NLP. POS TAGGER
NLP. PARSER
A natural language parser is a program that works out the grammatical structure of
sentences, for instance, which groups of words go together (as "phrases") and which
words are the subject or object of a verb.
Dependency tree Constituency tree
NLP. NAMED ENTITY RECOGNIZER
Named-entity recognition (NER) (also known as entity identification, entity chunking and
entity extraction) is a subtask of information extraction that seeks to locate and classify
named entities in text into pre-defined categories such as the names of persons,
organizations, locations, expressions of times, quantities, monetary values, percentages, etc.
PROJECT. THEORETICAL PART
THERE IS A DATASET OF LABELED TEXTS,
OUT TASK TO CREATE MACHINE LEARNING
PIPELINE, FOR TEXT CLASSIFICATION,
TRAINED ON GIVEN DATA
PROJECT. ML PIPELINE
feature extraction
text preprocessing
training classifier
evaluation
PROJECT. TEXT PREPROCESSING
• Removing non-text (e.g., ads, javascript)
• Dealing with text encoding (e.g., Unicode)
• Normalization
–extra-terrestrial/extraterrestrial, extra terrestrial
• Stemming
–computer/computation
• Morphological analysis
– car/cars
• Capitalization
– Now/NOW, led/LED
• Named entity extraction
– USA/usa
• Tokenization
PROJECT. FEATURE EXTRACTION
1. TF-IDF SCHEME
2. WORD EMBEDDING(WORD2VEC)
PROJECT. FEATURE EXTRACTION.TF-IDF
“TF-IDF is a weighting scheme that assigns each term in a
document a weight based on its term frequency (tf) and inverse
document frequency (idf). The terms with higher weight scores
are considered to be more important. It’s one of the most popular
weighting schemes in Information Retrieval”
PROJECT. FEATURE EXTRACTION.TF-IDF
Term Frequency (TF)
“Term Frequency, which measures how frequently a term occurs in a document.
Since every document is different in length, it is possible that a term would appear
much more times in long documents than shorter ones. Thus, the term frequency is
often divided by the document length as a way of normalization”
TF(t) = (Number of times term t appears in a document) / (Total number of terms
in the document)
PROJECT. FEATURE EXTRACTION.TF-IDF
Inverse Document Frequency(IDF)
“IDF: Inverse Document Frequency, which measures how important a term is. While
computing TF, all terms are considered equally important. However it is known that
certain terms, such as "is", "of", and "that", may appear a lot of times but have
little importance. Thus we need to weigh down the frequent terms while scale up
the rare ones, by computing the following:”
IDF(t) = log_e(Total number of documents / Number of documents with term t in
it)
Base 10 logarithms are just as good as these although the values are considerably smaller.
PROJECT. FEATURE EXTRACTION.TF-IDF
PROJECT. FEATURE EXTRACTION.WORD2VEC
WORD2VEC is used for learning vector
representations of words, called "word
embeddings".
PROJECT. FEATURE EXTRACTION.WORD2VEC
PROJECT. FEATURE EXTRACTION.WORD2VEC
PROJECT. FEATURE EXTRACTION.WORD2VEC
PROJECT. FEATURE EXTRACTION.WORD2VEC
PROJECT. FEATURE EXTRACTION.WORD2VEC
PROJECT. FEATURE EXTRACTION.WORD2VEC
PROJECT. FEATURE EXTRACTION.WORD2VEC
PROJECT. TRAINING CLASSIFIER
CLASSIFICATION ALGORITHMS
1. Support Vector Machines
2. k-Nearest Neighbors
3. Multinomial Naive Bayes
PROJECT. TRAINING CLASSIFIER
Support Vector Machines
PROJECT. TRAINING CLASSIFIER
k-Nearest Neighbors
PROJECT. TRAINING CLASSIFIER
Multinomial Naive Bayes
PROJECT. TEXT CLASSIFICATION EVALUATION
“If you cannot measure it, you can not improve it”
Lord Kelvin
Main metrics for Text Classification:
Precision and Recall
Precision and recall are the measures used in the information
retrieval domain to measure how well an information retrieval
system retrieves the relevant documents requested by a user.
The measures are defined as follows:

Precision  =  Total number of documents retrieved that are
relevant/Total number of documents that are retrieved.

Recall  =  Total number of documents retrieved that are
relevant/Total number of relevant documents in the database.
PROJECT. TEXT CLASSIFICATION EVALUATION
NLP FRAMEWORKS
QUESTIONS?

More Related Content

What's hot

NLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in PythonNLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in Pythonshanbady
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processingdhruv_chaudhari
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionAritra Mukherjee
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingCloudxLab
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingTed Xiao
 
The Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalThe Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalTony Russell-Rose
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processingMinh Pham
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Natural Language Processing and Machine Learning
Natural Language Processing and Machine LearningNatural Language Processing and Machine Learning
Natural Language Processing and Machine LearningKarthik Sankar
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2Yuriy Guts
 
Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games ResearchJose Zagal
 
Natural language processing
Natural language processingNatural language processing
Natural language processingHansi Thenuwara
 
Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 wordsananth
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingMariana Soffer
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
You too can nlp - PyBay 2018 lightning talk
You too can nlp - PyBay 2018 lightning talkYou too can nlp - PyBay 2018 lightning talk
You too can nlp - PyBay 2018 lightning talkJacob Perkins
 
Natural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative CommunicationNatural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative CommunicationDivya Sugumar
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AISaurav Shrestha
 

What's hot (20)

NLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in PythonNLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in Python
 
Nltk
NltkNltk
Nltk
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - Introduction
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
The Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalThe Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information Retrieval
 
NLTK
NLTKNLTK
NLTK
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing and Machine Learning
Natural Language Processing and Machine LearningNatural Language Processing and Machine Learning
Natural Language Processing and Machine Learning
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2
 
Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games Research
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 words
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
You too can nlp - PyBay 2018 lightning talk
You too can nlp - PyBay 2018 lightning talkYou too can nlp - PyBay 2018 lightning talk
You too can nlp - PyBay 2018 lightning talk
 
Natural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative CommunicationNatural Language Processing in Alternative and Augmentative Communication
Natural Language Processing in Alternative and Augmentative Communication
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
 

Similar to DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan

Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxAlyaaMachi
 
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsMining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsIJCERT JOURNAL
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingWaqas Tariq
 
INTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMINTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMijcsa
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationssChandan Deb
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflowseungwoo kim
 
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptxEXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptxAtulKumarUpadhyay4
 
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...csandit
 
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptAI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptpavankalyanadroittec
 
Text mining open source tokenization
Text mining open source tokenizationText mining open source tokenization
Text mining open source tokenizationaciijournal
 
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSISTEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSISaciijournal
 
Text Mining: open Source Tokenization Tools � An Analysis
Text Mining: open Source Tokenization Tools � An AnalysisText Mining: open Source Tokenization Tools � An Analysis
Text Mining: open Source Tokenization Tools � An Analysisaciijournal
 
Fast and accurate sentiment classification us and naive bayes model b516001
Fast and accurate sentiment classification  us and naive bayes model b516001Fast and accurate sentiment classification  us and naive bayes model b516001
Fast and accurate sentiment classification us and naive bayes model b516001Abhisek Sahoo
 
Natural Language Processing Theory, Applications and Difficulties
Natural Language Processing Theory, Applications and DifficultiesNatural Language Processing Theory, Applications and Difficulties
Natural Language Processing Theory, Applications and Difficultiesijtsrd
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_StudentsNLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_StudentsHimanshu kandwal
 
Natural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overviewNatural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overviewBenjaminlapid1
 

Similar to DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan (20)

Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptx
 
Top 10 Must-Know NLP Techniques for Data Scientists
Top 10 Must-Know NLP Techniques for Data ScientistsTop 10 Must-Know NLP Techniques for Data Scientists
Top 10 Must-Know NLP Techniques for Data Scientists
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsMining Opinion Features in Customer Reviews
Mining Opinion Features in Customer Reviews
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language Processing
 
INTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMINTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAM
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflow
 
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptxEXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
 
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
 
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptAI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
 
Text mining open source tokenization
Text mining open source tokenizationText mining open source tokenization
Text mining open source tokenization
 
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSISTEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS
 
Text Mining: open Source Tokenization Tools � An Analysis
Text Mining: open Source Tokenization Tools � An AnalysisText Mining: open Source Tokenization Tools � An Analysis
Text Mining: open Source Tokenization Tools � An Analysis
 
Fast and accurate sentiment classification us and naive bayes model b516001
Fast and accurate sentiment classification  us and naive bayes model b516001Fast and accurate sentiment classification  us and naive bayes model b516001
Fast and accurate sentiment classification us and naive bayes model b516001
 
Natural Language Processing Theory, Applications and Difficulties
Natural Language Processing Theory, Applications and DifficultiesNatural Language Processing Theory, Applications and Difficulties
Natural Language Processing Theory, Applications and Difficulties
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_StudentsNLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
 
Natural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overviewNatural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overview
 

Recently uploaded

Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Christo Ananth
 

Recently uploaded (20)

Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 

DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan

  • 1. RUDOLF EREMYAN MACHINE LEARNING SOFTWARE ENGINEER INTRODUCTION TO NATURAL LANGUAGE PROCESSING CONTACTS: EREMYAN.RUDOLF@GMAIL.COM HTTPS://WWW.LINKEDIN.COM/IN/RUDOLFEREMYAN/
  • 2. CHATBOT FRAMEWORK FOR GEORGIAN LANGUAGE TI BOT FOR TBC BANK • 35K LIKES • 100K CONVERSATIONS • 8K ACTIVE USERS PER MONTH • 41,5K USERS ASKES ABOUT WEATHER • 1K P2P TRANSACTIONS IN AUGUST
  • 3. SENTIMENT ANALYSIS ON FACEBOOK COMMENTS
  • 4. NATURAL LANGUAGE PROCESSING https://en.wikipedia.org/wiki/Natural_language_processing NATURAL LANGUAGE PROCESSING (NLP) IS A FIELD OF COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL LINGUISTICS CONCERNED WITH THE INTERACTIONS BETWEEN COMPUTERS AND HUMAN (NATURAL) LANGUAGES, AND, IN PARTICULAR, CONCERNED WITH PROGRAMMING COMPUTERS TO FRUITFULLY PROCESS LARGE NATURAL LANGUAGE CORPORA.
  • 5. THE HISTORY OF NLP https://en.wikipedia.org/wiki/Natural_language_processing 1950 - ALAN TURING PUBLISHED AN ARTICLE TITLED "COMPUTING MACHINERY AND INTELLIGENCE" WHICH PROPOSED WHAT IS NOW CALLED THE TURING TEST AS A CRITERION OF INTELLIGENCE.
  • 6. THE HISTORY OF NLP https://en.wikipedia.org/wiki/Natural_language_processing 1954 - THE GEORGETOWN EXPERIMENT INVOLVED FULLY AUTOMATIC TRANSLATION OF MORE THAN SIXTY RUSSIAN SENTENCES INTO ENGLISH. THE AUTHORS CLAIMED THAT WITHIN THREE OR FIVE YEARS, MACHINE TRANSLATION WOULD BE A SOLVED PROBLEM.
  • 7. THE HISTORY OF NLP https://en.wikipedia.org/wiki/Natural_language_processing 1970 - MANY PROGRAMMERS BEGAN TO WRITE "CONCEPTUAL ONTOLOGIES", WHICH STRUCTURED REAL- WORLD INFORMATION INTO COMPUTER-UNDERSTANDABLE DATA. EXAMPLES ARE QUALM (LEHNERT, 1977), POLITICS (CARBONELL, 1979), AND PLOT UNITS (LEHNERT 1981). DURING THIS TIME, MANY CHATTERBOTS WERE WRITTEN INCLUDING PARRY, RACTER. • WORDNET • EUROWORDNET • SENTIWORDNET
  • 8. THE HISTORY OF NLP https://en.wikipedia.org/wiki/Natural_language_processing 1980 - THERE WAS A REVOLUTION IN NLP WITH THE INTRODUCTION OF MACHINE LEARNING ALGORITHMS FOR LANGUAGE PROCESSING. PART- OF-SPEECH TAGGING INTRODUCED THE USE OF HIDDEN MARKOV MODELS TO NLP, AND INCREASINGLY, RESEARCH HAS FOCUSED ON STATISTICAL MODELS, WHICH MAKE SOFT, PROBABILISTIC DECISIONS BASED ON ATTACHING REAL-VALUED WEIGHTS TO THE FEATURES MAKING UP THE INPUT DATA.
  • 9. THE HISTORY OF NLP https://en.wikipedia.org/wiki/Natural_language_processing IN RECENT YEARS, THERE HAS BEEN A FLURRY OF RESULTS SHOWING DEEP LEARNING TECHNIQUES ACHIEVING STATE-OF-THE-ART RESULTS IN MANY NATURAL LANGUAGE TASKS, FOR EXAMPLE IN LANGUAGE MODELING, PARSING AND MANY OTHERS.
  • 10. HAVE YOU EVER USED ANY NLP PRODUCTS?
  • 11. HAVE YOU EVER USED ANY NLP PRODUCTS?
  • 12. NLP APPLICATIONS TEXT CLASSIFICATION TEXT CLUSTERING TEXT SUMMARISATION MACHINE TRANSLATION SEMANTIC SEARCH SENTIMENT ANALYSIS QUESTION ANSWERING INFORMATION EXTRACTION
  • 13. NLP. TEXT CLASSIFICATION Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" or algorithmically. Popular algorithms: 1. Multinomial Naive Bayes 2. SVM 3. Neural Networks
  • 14. NLP. TEXT CLUSTERING Document clustering (or text clustering) is the application of cluster analysis to textual documents. It has applications in automatic document organization, topic extraction and fast information retrieval or filtering. Popular algorithms: 1. k-Means 2. DBSCAN 3. Deep Learning
  • 15. NLP. TEXT SUMMARISATION Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document. Technologies that can make a coherent summary take into account variables such as length, writing style and syntax. Popular algorithms: 1. LDA 2. Deep Learning
  • 16. NLP. MACHINE TRANSLATION MT performs simple substitution of words in one language for words in another, but that alone usually cannot produce a good translation of a text because recognition of whole phrases and their closest counterparts in the target language is needed. Solving this problem with corpus statistical, and neural techniques is a rapidly growing field that is leading to better translations, handling differences in linguistic typology, translation of idioms, and the isolation of anomalies Algorithms: 1. Rule based 2. Statistical methods 3. Encoder-Decoder
  • 17. NLP. SEMANTIC SEARCH Semantic search seeks to improve search accuracy by understanding searcher intent and the contextual meaning of terms as they appear in the searchable dataspace, whether on the Web or within a closed system, to generate more relevant results. Approaches: 1. Entity Recognition 2. User context
  • 18. NLP. SENTIMENT ANALYSIS Sentiment Analysis is the process of determining whether a piece of writing is positive, negative or neutral. It's also known as opinion mining, deriving the opinion or attitude of a speaker. Algorithms: 1. Lexicon-based 2. Machine Learning (SVM) 3. Deep Learning (RNN, LSTM)
  • 19. NLP. QUESTION ANSWERING Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural language. Algorithms: 1. Rule based 2. Machine Learning 3. Deep Learning
  • 20. NLP. INFORMATION EXTRACTION Information extraction is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents.
  • 21. NLP TOOLS 1. MORPHOLOGICAL ANALYZER 2. POS TAGGER 3. STEMMER 4. PARSERS 5. NAMED ENTITY RECOGNIZER
  • 22. NLP. STEMMER Stemmers remove morphological affixes from words, leaving only the word stem. bananas -> banana flies -> fli cats -> cat dogs -> dog How about “flies” -> fly?
  • 23. NLP. MORPHOLOGICAL ANALYZER Lemmatization usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma . flies -> fly went -> go am, are, is -> be
  • 25. NLP. POS TAGGER A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine- grained POS tags like 'noun-plural'.
  • 27. NLP. PARSER A natural language parser is a program that works out the grammatical structure of sentences, for instance, which groups of words go together (as "phrases") and which words are the subject or object of a verb. Dependency tree Constituency tree
  • 28. NLP. NAMED ENTITY RECOGNIZER Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.
  • 29. PROJECT. THEORETICAL PART THERE IS A DATASET OF LABELED TEXTS, OUT TASK TO CREATE MACHINE LEARNING PIPELINE, FOR TEXT CLASSIFICATION, TRAINED ON GIVEN DATA
  • 30. PROJECT. ML PIPELINE feature extraction text preprocessing training classifier evaluation
  • 31. PROJECT. TEXT PREPROCESSING • Removing non-text (e.g., ads, javascript) • Dealing with text encoding (e.g., Unicode) • Normalization –extra-terrestrial/extraterrestrial, extra terrestrial • Stemming –computer/computation • Morphological analysis – car/cars • Capitalization – Now/NOW, led/LED • Named entity extraction – USA/usa • Tokenization
  • 32. PROJECT. FEATURE EXTRACTION 1. TF-IDF SCHEME 2. WORD EMBEDDING(WORD2VEC)
  • 33. PROJECT. FEATURE EXTRACTION.TF-IDF “TF-IDF is a weighting scheme that assigns each term in a document a weight based on its term frequency (tf) and inverse document frequency (idf). The terms with higher weight scores are considered to be more important. It’s one of the most popular weighting schemes in Information Retrieval”
  • 34. PROJECT. FEATURE EXTRACTION.TF-IDF Term Frequency (TF) “Term Frequency, which measures how frequently a term occurs in a document. Since every document is different in length, it is possible that a term would appear much more times in long documents than shorter ones. Thus, the term frequency is often divided by the document length as a way of normalization” TF(t) = (Number of times term t appears in a document) / (Total number of terms in the document)
  • 35. PROJECT. FEATURE EXTRACTION.TF-IDF Inverse Document Frequency(IDF) “IDF: Inverse Document Frequency, which measures how important a term is. While computing TF, all terms are considered equally important. However it is known that certain terms, such as "is", "of", and "that", may appear a lot of times but have little importance. Thus we need to weigh down the frequent terms while scale up the rare ones, by computing the following:” IDF(t) = log_e(Total number of documents / Number of documents with term t in it) Base 10 logarithms are just as good as these although the values are considerably smaller.
  • 37. PROJECT. FEATURE EXTRACTION.WORD2VEC WORD2VEC is used for learning vector representations of words, called "word embeddings".
  • 45. PROJECT. TRAINING CLASSIFIER CLASSIFICATION ALGORITHMS 1. Support Vector Machines 2. k-Nearest Neighbors 3. Multinomial Naive Bayes
  • 49. PROJECT. TEXT CLASSIFICATION EVALUATION “If you cannot measure it, you can not improve it” Lord Kelvin Main metrics for Text Classification: Precision and Recall Precision and recall are the measures used in the information retrieval domain to measure how well an information retrieval system retrieves the relevant documents requested by a user. The measures are defined as follows: Precision  =  Total number of documents retrieved that are relevant/Total number of documents that are retrieved. Recall  =  Total number of documents retrieved that are relevant/Total number of relevant documents in the database.