SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
Basic Natural Language Processing
using
Natural (JavaScript/Node) Library
Aniruddha Chakrabarti
AVP and Chief Architect, Digital, Mphasis
@anchakra | Linkedin.com/in/aniruddhac | slideshare.net/aniruddha.chakrabarti/
Agenda
• Emergence of Artificial Intelligence, AI First
• What is Natural Language Processing (NLP)
• Natural JavaScript/Node NLP Library
• Tokenization - Word Tokenizer
• Stemming and Lemmatization
• String Distance
• Inflectors
• Phonetics
• N-Grams
• Classifier
• tf-idf
• POS Tagger
• Spell Check
→ Turing Machine
→ Automating manual processes,
tabulating data
→ Reducing manual effort and time
→ IBM System/360 (S/360),
Mainframes, AS/400
→ Computing Power (Moore’s Law)
→ Systems need to be explicitly programmed using
explicit logic and rules. Pre programmed
→ Personal Computers (PCs), Communication
(Networked PCs, Client/Server, Internet, WWW)
→ Automating business processes
→ Mostly structured data
→ Systems that learn from historical data and can make predictions. Not
rule based system.
→ Uses Machine Learning, NLP to analyze unstructured data (text, image,
audio, video)
→ Predictive Analytics, Deep Learning, Neural Nets,
→ OCR, Speech recognition, Text to speech, Face recognition, Video
analysis, …
→ Cognitive Services (pay as you go model) – IBM Watson, Microsoft
Cognitive Services, …
→ Robotics, Internet of Things, Conversational Systems, Wearables, Blur of
physical & virtual
→ Still mostly Weak AI / Narrow AI
Third Era of Computing * - AI First/AI Everywhere (Cognitive Systems)
* From “The Computing Universe” by Tony Hey and Gyuri Papav
→ Strong AI / Full AI
→ Artificial General
Intelligence (AGI)
Tabulating Machines
1960 – 1980
Programmable Systems
1980 - 2010
AI First/AI Everywhere
(Cognitive Systems)
2010 - Current
Real AI ?
?
AI Winter AI Summer
• Artificial Intelligence has emerged as the third era of computing after tabulating machine and
programmable systems.
Gartner Hype Cycle … 2017
• AI technologies like Cognitive Computing, Virtual
Assistants/Chatbot, Conversational AI, Machine
Learning, Deep Learning and Autonomous Vehicles
appear at the peak in Gartner Hype Cycle of Emerging
Technologies, 2017.
• Reinforcement Learning and Artificial General
Intelligence (AGI) has appeared at the starting points of
hype cycle – they are expected to peak in coming years.
Emergence of “AI Everywhere”
Gartner recons AI as one of the
three mega trends. AI
technologies like
Conversational UI, Machine
Learning, Deep Learning and
Cognitive Computing
constitutes “AI Everywhere”
What is Natural Language Processing?
• Field of computer science, artificial intelligence and computational linguistics concerned
with the interactions between computers and human (natural) languages, and, in particular,
concerned with programming computers to fruitfully process large natural language corpora –
Wikipedia
• Broadly categorized into two areas -
▪ Natural Language Understanding (NLU)
▪ Natural Language Generation (NLG)
Natural Language
Processing (NLP)
Natural Language
Understanding (NLU)
Natural Language
Generation (NLG)
Some applications of NLP
• Spell correction (MS Word/ any other editor)
• Search engines (Google, Bing, Yahoo, wolfram alpha)
• Speech engines (Siri, Google Voice, Cortana)
• Personal Voice Assistants (Amazon Alexa, Google Home, …)
• Spam classifiers (All e-mail services)
• News feeds (Google, Yahoo!, and so on)
• Machine translation (Google Translate, and so on)
• Chatbots, Intelligent Virtual Agent/IVA
• IBM Watson, Microsoft LUIS, Amazon Lex/Alexa
NLP Tools & Libraries
• GATE
• Mallet (Java)
• Open NLP – Apache (Java)
• UIMA
• CoreNLP - Stanford CoreNLP toolkit (Java)
• Genism
• Natural Language Toolkit / NLTK (Python) – by far the most popular NLP library & tool
• spaCy (Python) – built on top of NLTK
• TextBlob
• Natural Library (JavaScript/Node)
NLTK
What is Natural
• "Natural" is a general natural language processing library for nodejs.
• Supports basic NLP tasks like tokenizing, stemming, classification, phonetics, tf-idf, WordNet,
string similarity, inflections
• At the moment, most of the algorithms are English-specific
• Created by Chris Umbel
• Loosely based on NLTK (Python) NLP Library
• https://github.com/NaturalNode/natural
• http://www.chrisumbel.com/article/node_js_natural_language_porter_stemmer_lancaster_baye
s_naive_metaphone_soundex
Natural library install and setup
• Install using npm (Package manager for Node), use –g switch (for global installation)
• Include the Natural package through require
npm install –g natural
// include the natural library
let Natural = require('natural');
Tokenization
• A word (Token) is the minimal unit that a machine can understand and process.
• Tokenization is the process of splitting the raw string into meaningful tokens
• Raw text cannot be further processed without going through tokenization.
• Complexity of tokenization varies according to the need of the NLP application, and the
complexity of the language itself.
▪ In English it can be as simple as choosing only words and numbers through a regular
expression. But for Chinese and Japanese, it will be a very complex task.
• Two primary types of tokenizers:
▪ Word Tokenizer: Tokenizes raw text to words
▪ Sentence Tokenizer: Tokenizes raw text to sentences
Word Tokenizer
• A word (Token) is the minimal unit that a machine can understand & process
• Tokenization is the process of splitting the raw string into meaningful tokens – Tokenizer
tokenizes or splits raw text into words
• Natural comes with multiple tokenizers -
▪ Word Tokenizer: a tokenizer that divides a text into sequences of alphabetic and
numeric characters. (Ignores punctuation)
▪ Word Punct Tokenizer: Word + punctuation tokenizer. A tokenizer that divides a text into
sequences of alphabetic and non-alphabetic characters.
▪ Treebank Word Tokenizer: uses regular expressions to tokenize text as in Penn
Treebank
▪ Regexp Tokenizer: Tokenizes text using regular expression patterns.
▪ Aggressive Tokenizer:
Word Tokenizer (Cont’d)
var sentence = "Hello, how are you? I don't know you!"
var wordTokenizer = new Natural.WordTokenizer();
var tokens = wordTokenizer.tokenize(sentence);
console.log(tokens);
// prints [ 'Hello', 'how', 'are', 'you', 'I', 'don', 't', 'know', 'you' ]
var tokenizer = new Natural.WordPunctTokenizer();
var tokens = tokenizer.tokenize(sentence);
console.log(tokens);
// prints [ 'Hello', ', ', 'how', 'are', 'you', '? ', 'I', 'don', '‘’,
// 't’, 'know', 'you', '!' ]
var tokenizer = new Natural. TreebankWordTokenizer();
var tokens = tokenizer.tokenize(sentence);
console.log(tokens);
// prints [ 'Hello', ', ', 'how', 'are', 'you', '? ', 'I', 'don', '‘’,
// 't’, 'know', 'you', '!' ]
console.log(new Natural.AgressiveTokenizer().tokenize(sentence));
// prints ['Hello', 'how', 'are', 'you', 'I', 'don', 't', 'know', 'you' ]
Stemming
• Process of reducing inflected or derived words to their word stem, base or root form.
• Similar to cutting down the branches of a tree to its stem
• More of a crude rule-based process by which we want to club together different variations of
the token – rule based
• Removes –s/es or -ing or -ed
eating, eats, eaten, eat -> eat
stopping, stopped, stops, stop -> stop
ate -> ate (wrong should be eat)
Stemming (Cont’d)
• Different stemming algorithms -
▪ Lovins Stemmer - First published stemmer was written by Julie Beth Lovins in 1968.
Lovins Stemmer is not used currently.
▪ Porter Stemmer - Written by Martin Porter and in July 1980. Very widely used and
became the de facto standard algorithm used for English stemming.
▪ Lancaster Stemmer - Paice/Husk stemmer developed at Lancaster University. The
stemmer, although remaining efficient and easily implemented, is known to be very
strong and aggressive. The stemmer utilizes a single table of rules, each of which may
specify the removal or replacement of an ending.
▪ Snowball Stemmer – Also called Porter2 stemmer, since this is an updated version of
original Porter Stemmer. Natural does not support Snowball Stemmer
• Lemmatization is a more robust and methodical way of combining grammatical variations to
the root of a word.
▪ Natural does not support any Lemmatization algorithm.
▪ NLTK and other matured NLP libraries support Lemmatization
Stemming – Porter Stemmer and Lancaster Stemmer
var porterStemmer = Natural.PorterStemmer;
console.log(porterStemmer.stem("ate")); // prints at
console.log(porterStemmer.stem("eating")); // prints eat
console.log(porterStemmer.stem("eats")); // prints eat
console.log(porterStemmer.stem("eat")); // prints eat
console.log(porterStemmer.stem("agreement")); // prints agreement
var lancasterStemmer = Natural.LancasterStemmer;
console.log(lancasterStemmer.stem("ate")); // prints at
console.log(lancasterStemmer.stem("eating")); // prints eat
console.log(lancasterStemmer.stem("eats")); // prints eat
console.log(lancasterStemmer.stem("eat")); // prints eat
console.log(lancasterStemmer.stem("agreement")); // prints agr
• Natural supports Porter Stemmer and Lancaster Stemmer only. It does not support Snowball
Stemmer.
• Both the stemmers provide a stem method
Stemming – Porter Stemmer (Non English languages)
• Natural supports Porter Stemmer in Non English languages also
• Following languages are supported -
▪ Farsi - PorterStemmerFa
▪ French - PorterStemmerFr
▪ Russian - PorterStemmerRu
▪ Spanish - PorterStemmerEs
▪ Italian - PorterStemmerIt
▪ PorterStemmerNo
▪ Swedish - PorterStemmerSv
▪ PorterStemmerPt
Lemmatization
• More methodical way of converting all the grammatical/inflected forms of the root of the
word.
• Uses context and part of speech to determine the inflected form of the word and applies
different normalization rules for each part of speech to get the root word (lemma)
• Natural NLP library does not support Lemmatization.
Inflector
• Inflectors are used to pluralize or singularize words
• There are different types of Inflectors available in Natural Library
▪ Noun Inflector: pluralize or singularize nouns only
▪ Verb Inflector: Verbs can be pluralized/singularized with a Verb Inflector. Natural
provides a inflector called PresentVerbInflector which works on Present Tense Verbs
only
▪ Both noun and verb inflector provides singularize and pluralize methods
▪ Number or Count Inflector: Ordinal numbers could be formed from normal number
▪ Provides a single method called nth which returns the ordinal form of any number
passed
Inflector (Cont’d)
// pluralize or singularize nouns only
var nounInflector = new Natural.NounInflector();
console.log(nounInflector.pluralize("Book")); // prints Books
console.log(nounInflector.pluralize("radius")); // prints radii
console.log(nounInflector.singularize("flies")); // prints fly
console.log(nounInflector.singularize("men")); // prints man
var countInflector = Natural.CountInflector;
console.log(countInflector.nth("1")); // prints 1st
console.log(countInflector.nth("2")); // prints 2nd
console.log(countInflector.nth("3")); // prints 3rd
console.log(countInflector.nth("4")); // prints 4th
console.log(countInflector.nth("10")); // prints 10th
var verbInflector = new Natural.PresentVerbInflector();
console.log(verbInflector.singularize("go")); // prints goes
console.log(verbInflector.singularize("run")); // prints runs
console.log(verbInflector.pluralize("becomes")); // prints become
console.log(verbInflector.pluralize("presents")); // prints present
N-Grams
• an n-gram is a contiguous sequence of n items from a given sample of text or speech.
• The items can be phonemes, syllables, letters, words or base pairs according to the
application. The n-grams typically are collected from a text or speech corpus.
• When the items are words, n-grams may also be called shingles
• An n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram"; size 3 is a "trigram".
• Larger sizes are sometimes referred to by the value of n in modern language, e.g., "four-
gram", "five-gram", and so on.
Hello how are you Hello how how are are you
bigram
Hello how are you Hello how are how are you
trigram
Hello how are you Hello
unigram
how are you
N-Grams (Cont’d)
var sentence = "Hello how are you";
var ngrams = Natural.NGrams;
console.log(ngrams.bigrams(sentence));
// prints [ [ 'Hello', 'how' ], [ 'how', 'are' ], [ 'are', 'you' ] ]
console.log(ngrams.trigrams(sentence));
// prints [ [ 'Hello', 'how', 'are' ], [ 'how', 'are', 'you' ] ]
console.log(ngrams.ngrams(sentence, 1)); // unigram
//prints [ [ 'Hello' ], [ 'how' ], [ 'are' ], [ 'you' ] ]
sentence = "NLTK is a Natural Language Processing Library in Nodejs";
console.log(ngrams.ngrams(sentence, 4)); // four-gram
prints [ [ 'NLTK', 'is', 'a', 'Natural' ],
[ 'is', 'a', 'Natural', 'Language' ],
[ 'a', 'Natural', 'Language', 'Processing' ],
[ 'Natural', 'Language', 'Processing', 'Library' ],
[ 'Language', 'Processing', 'Library', 'in' ],
[ 'Processing', 'Library', 'in', 'Nodejs' ] ]
Phonetics
• A phonetic algorithm is an algorithm for indexing of words by their pronunciation.
• A phonetic matching algorithm is an algorithm that matches word by their pronunciation rather
than spelling.
• Most phonetic algorithms were developed for use with the English language. Consequently,
applying the rules to words in other languages might not give a meaningful result.
• Some of the well known phonetics algorithms are –
▪ Soundex - Developed to encode surnames for use in censuses. Soundex codes are four-
character strings composed of a single letter followed by three numbers.
▪ Daitch–Mokotoff Soundex - Refinement of Soundex designed to better match surnames of
Slavic & Germanic origin. Daitch–Mokotoff Soundex codes are strings composed of six
numeric digits.
▪ Cologne phonetics - Similar to Soundex, but more suitable for German words.
▪ Metaphone, Double Metaphone, and Metaphone 3 - Suitable for use with most English
words, not just names. Metaphone algorithms are basis for many popular spell checkers.
▪ New York State Identification and Intelligence System (NYSIIS) - Maps similar phonemes to
the same letter. The result is a string that can be pronounced by the reader without decoding.
▪ Match Rating Approach developed by Western Airlines in 1977 - this algorithm has an
encoding and range comparison technique.
▪ Caverphone, created to assist in data matching between late 19th century and early 20th
century electoral rolls, optimized for accents present in parts of New Zealand.
Phonetics Matching (Cont’d)
• Natural supports Phonetic Matching using three algorithms –
▪ SoundEx
▪ Metaphone
▪ DoubleMetaphone
var metaphone = Natural.Metaphone;
var soundex = Natural.SoundEx;
var doubleMetaphone = Natural.DoubleMetaphone;
// using SoundEx for phonetic matching
console.log(soundex.compare("nuremberg", "nuremburg")); // returns true
console.log(soundex.compare("Paris", "Pari")); // returns false
// using Metaphone for phonetic matching
console.log(metaphone.compare("Fool", "Full")); // returns true
console.log(metaphone.compare("Fool", "Failed")); // returns false
// using Double Metaphone for phonetic matching
console.log(doubleMetaphone.compare("Bangalore", "Bengaluru")); // returns true
console.log(doubleMetaphone.compare("Mumbai", "Bombay")); // returns false
String Distance
• String Distance measures how closely two strings match.
• Natural provides JaroWinkler Distance and Levenshtein Distance algorithms for String
Distance match
JaroWinkler Distance
• Jaro distance between two words is the minimum number of single-character transpositions
required to change one word into the other.
• It is a variant proposed in 1990 by William E. Winkler of the Jaro distance metric (1989,
Matthew A. Jaro).
• Returns a number between 0 and 1 which tells how closely the strings match (0 = no match,
1 = exact match)
// Using JaroWrinkler Distance algorithm
console.log(Natural.JaroWinklerDistance("Hello", "Hello")); // returns 1: exact match
console.log(Natural.JaroWinklerDistance("Me", "You")); // returns 0: no match
console.log(Natural.JaroWinklerDistance("Bangalore", "Bengaluru")); // returns 0.72: partial match
console.log(Natural.JaroWinklerDistance("Mumbai", "Bombay")); // returns 0.66: partial match
String Distance - Levenstein Distance
• Levenstein Distance between two words is the minimum number of single-character edits
(insertions, deletions or substitutions) required to change one word into the other.
• Named after the Soviet mathematician Vladimir Levenshtein, who considered this distance
in 1965
• Also be referred as edit distance
// Using Levenshtein Distance algorithm
console.log(Natural.LevenshteinDistance("Hello", "Hello")); // 0
console.log(Natural.LevenshteinDistance("Bangalore", "Bengaluru")); // 3
console.log(Natural.LevenshteinDistance("Mumbai", "Bombay")); // 3
console.log(Natural.LevenshteinDistance("Chennai", "Madras")); // 6
console.log(Natural.LevenshteinDistance("Nuremberg", "Nuremburg")); // 1
B a n g a l o r e B e n g a l u r u
3 character change
N u r e m b e r g N u r e m b u r g
1 character change
tf-idf
• tf–idf or TFIDF is short for term frequency - inverse document frequency
• tf-idf determines how important a word (or words) is to a document relative to a corpus.
• Often used as weighting factor in searches of information retrieval, text mining & user modeling.
• The tf-idf value increases proportionally to the number of times a word appears in the
document and is offset by the frequency of the word in the corpus, which helps to adjust for
the fact that some words appear more frequently in general.
• tfidf method returns the measure of importance of a word
var tfidf = new Natural.TfIdf();
// Documents could be added to tf-idf. Here only a single doc is added, but more could be added
tfidf.addDocument("this document is about node. Its also about NLP. Node is used for it");
// Find out the tf-idf of different words in the document
console.log(tfidf.tfidf("node", 0)); // prints 0.61 as node appears multiple times in the doc
console.log(tfidf.tfidf("NLP", 0)); // prints 0.30 as NLP appears only single time
console.log(tfidf.tfidf("ruby", 0)); // prints 0 as ruby does not appear in the doc
console.log(tfidf.listTerms(0)); [ { term: 'node', tfidf: 0.6137056388801094 },
{ term: 'document', tfidf: 0.3068528194400547 },
{ term: 'nlp', tfidf: 0.3068528194400547 },
{ term: 'used', tfidf: 0.3068528194400547 } ]
tf-idf (cont’d)
• Disc files could also be added to tf-idf
• Multiple documents could be added to tf-idf
var tfidf = new Natural.TfIdf();
// Adding files from disc to tfidf
tfidf.addFileSync("C:/Data/Profile.txt");
console.log(tfidf.listTerms(0));
// Multiple documents added to tdidf which forms the entire corpus
tfidf.addDocument('this document is about node. Its also about NLP. Node is used for it');
tfidf.addDocument('this document is about ruby.');
tfidf.addDocument('this document is about ruby and node.');
console.log(tfidf.tfidf("node", 0)); // prints 2
console.log(tfidf.tfidf("NLP", 0)); // prints 1.40
console.log(tfidf.tfidf("ruby", 0)); // prints 0
console.log(tfidf.tfidf("node", 1)); // prints 0 as node does not appear in 2nd doc
console.log(tfidf.tfidf("ruby", 1)); // prints 1 as ruby appears in 2nd doc
console.log(tfidf.tfidf("node", 2)); // prints 1 as node appears in 3rd doc
console.log(tfidf.tfidf("ruby", 2)); // prints 1 as ruby appears in 3rd doc
tf-idf (cont’d)
• tfidf method returns the measure of importance of a word in various documents
• tfidf method accepts the word and a callback
// Multiple documents added to tdidf which forms the entire corpus
tfidf.addDocument('this document is about node. Its also about NLP. Node is used for it');
tfidf.addDocument('this document is about ruby.');
tfidf.addDocument('this document is about ruby and node.’);
// tfidfs method is used to find the importance of the word across multiple documents
tfidf.tfidfs('node', function(ctr, measure){
console.log('tf-idf of node in document #' + ctr + ' is ' + measure);
});
POS (Part of Speech) Tagging
• Process of marking up a word in a text (corpus) as corresponding to a particular part of
speech, based on both its definition and its context—i.e., its relationship with adjacent and
related words in a phrase, sentence, or paragraph.
• Also called grammatical tagging or word-category disambiguation,
POS (Part of Speech) Tagging
• Current state of the art POS tagging algorithms can predict the POS of the given word with
a higher degree of precision (that is approximately 97%). But still lots of research going on
in the area of POS tagging.
No Tag Description
1. CC Coordinating conjunction
2. CD Cardinal number
3. DT Determiner
4. EX Existential there
5. FW Foreign word
6. IN Preposition or subordinating conjunction
7. JJ Adjective
8. JJR Adjective, comparative
9. JJS Adjective, superlative
10. LS List item marker
11. MD Modal
12. NN Noun, singular or mass
13. NNS Noun, plural
14. NNP Proper noun, singular
15. NNPS Proper noun, plural
16. PDT Predeterminer
17. POS Possessive ending
18. PRP Personal pronoun
No Tag Description
19. PRP$ Possessive pronoun
20. RB Adverb
21. RBR Adverb, comparative
22. RBS Adverb, superlative
23. RP Particle
24. SYM Symbol
25. TO to
26. UH Interjection
27. VB Verb, base form
28. VBD Verb, past tense
29. VBG Verb, gerund or present participle
30. VBN Verb, past participle
31. VBP Verb, non-3rd person singular present
32. VBZ Verb, 3rd person singular present
33. WDT Wh-determiner
34. WP Wh-pronoun
35. WP$ Possessive wh-pronoun
36. WRB Wh-adverb
POS Tagging – Brill POS Tagger
• Natural supports POS tagging through Brill POS Tagger that implements Eric Brill's
transformational algorithm (transformation rules are specified in external files).
• E. Brill's tagger, most widely used English POS-taggers, employs rule-based algorithms.
// Path where natural library is located
var baseFolder = path.join(path.dirname(require.resolve("natural")), "brill_pos_tagger");
// Rules file located in /data/<language> sub folder under natural library
var rulesFilename = baseFolder + "/data/English/tr_from_posjs.txt";
// Lexicon file located in /data/<language> sub folder under natural library
var lexiconFilename = baseFolder + "/data/English/lexicon_from_posjs.json";
var defaultCategory = 'N';
var lexicon = new Natural.Lexicon(lexiconFilename, defaultCategory);
var rules = new Natural.RuleSet(rulesFilename);
// Any tagger needs lexicon and rules for successful POS tagging of words
// Brill POS Tagger object is created passing lexicon file and rules file location
var tagger = new Natural.BrillPOSTagger(lexicon, rules);
var sentence = "I see the man with the telescope";
var tokenizer = new Natural.WordTokenizer();
// tokenize the sentence to tokens
var tokens = tokenizer.tokenize(sentence);
console.log(tagger.tag(tokens));
[ [ 'I', 'NN' ],
[ 'see', 'VB' ],
[ 'the', 'DT' ],
[ 'man', 'NN' ],
[ 'with', 'IN' ],
[ 'the', 'DT' ],
[ 'telescope', 'NN' ] ]

Mais conteúdo relacionado

Mais procurados

Introduction To Computer Programming
Introduction To Computer ProgrammingIntroduction To Computer Programming
Introduction To Computer ProgrammingHussain Buksh
 
Zafiyet tespiti ve sizma yöntemleri
Zafiyet tespiti ve sizma yöntemleriZafiyet tespiti ve sizma yöntemleri
Zafiyet tespiti ve sizma yöntemleriEPICROUTERS
 
Demystify eBPF JIT Compiler
Demystify eBPF JIT CompilerDemystify eBPF JIT Compiler
Demystify eBPF JIT CompilerNetronome
 
LCU14 302- How to port OP-TEE to another platform
LCU14 302- How to port OP-TEE to another platformLCU14 302- How to port OP-TEE to another platform
LCU14 302- How to port OP-TEE to another platformLinaro
 
Introduction to go lang
Introduction to go langIntroduction to go lang
Introduction to go langAmal Mohan N
 
HKG15-311: OP-TEE for Beginners and Porting Review
HKG15-311: OP-TEE for Beginners and Porting ReviewHKG15-311: OP-TEE for Beginners and Porting Review
HKG15-311: OP-TEE for Beginners and Porting ReviewLinaro
 
LCA14: LCA14-418: Testing a secure framework
LCA14: LCA14-418: Testing a secure frameworkLCA14: LCA14-418: Testing a secure framework
LCA14: LCA14-418: Testing a secure frameworkLinaro
 
Practical Malware Analysis: Ch 4 A Crash Course in x86 Disassembly
Practical Malware Analysis: Ch 4 A Crash Course in x86 Disassembly Practical Malware Analysis: Ch 4 A Crash Course in x86 Disassembly
Practical Malware Analysis: Ch 4 A Crash Course in x86 Disassembly Sam Bowne
 
SFO15-503: Secure storage in OP-TEE
SFO15-503: Secure storage in OP-TEESFO15-503: Secure storage in OP-TEE
SFO15-503: Secure storage in OP-TEELinaro
 
8. Software Development Security
8. Software Development Security8. Software Development Security
8. Software Development SecuritySam Bowne
 
HKG18-402 - Build secure key management services in OP-TEE
HKG18-402 - Build secure key management services in OP-TEEHKG18-402 - Build secure key management services in OP-TEE
HKG18-402 - Build secure key management services in OP-TEELinaro
 
Programming language design and implemenation
Programming language design and implemenationProgramming language design and implemenation
Programming language design and implemenationAshwini Awatare
 
Shaping the Future of Automatic Programming
Shaping the Future of Automatic ProgrammingShaping the Future of Automatic Programming
Shaping the Future of Automatic ProgrammingChristos Tsakostas
 

Mais procurados (20)

Introduction To Computer Programming
Introduction To Computer ProgrammingIntroduction To Computer Programming
Introduction To Computer Programming
 
Zafiyet tespiti ve sizma yöntemleri
Zafiyet tespiti ve sizma yöntemleriZafiyet tespiti ve sizma yöntemleri
Zafiyet tespiti ve sizma yöntemleri
 
Demystify eBPF JIT Compiler
Demystify eBPF JIT CompilerDemystify eBPF JIT Compiler
Demystify eBPF JIT Compiler
 
LCU14 302- How to port OP-TEE to another platform
LCU14 302- How to port OP-TEE to another platformLCU14 302- How to port OP-TEE to another platform
LCU14 302- How to port OP-TEE to another platform
 
Introduction to go lang
Introduction to go langIntroduction to go lang
Introduction to go lang
 
HKG15-311: OP-TEE for Beginners and Porting Review
HKG15-311: OP-TEE for Beginners and Porting ReviewHKG15-311: OP-TEE for Beginners and Porting Review
HKG15-311: OP-TEE for Beginners and Porting Review
 
Toolchain
ToolchainToolchain
Toolchain
 
Embedded Linux on ARM
Embedded Linux on ARMEmbedded Linux on ARM
Embedded Linux on ARM
 
LCA14: LCA14-418: Testing a secure framework
LCA14: LCA14-418: Testing a secure frameworkLCA14: LCA14-418: Testing a secure framework
LCA14: LCA14-418: Testing a secure framework
 
Practical Malware Analysis: Ch 4 A Crash Course in x86 Disassembly
Practical Malware Analysis: Ch 4 A Crash Course in x86 Disassembly Practical Malware Analysis: Ch 4 A Crash Course in x86 Disassembly
Practical Malware Analysis: Ch 4 A Crash Course in x86 Disassembly
 
Introduction to Linux Drivers
Introduction to Linux DriversIntroduction to Linux Drivers
Introduction to Linux Drivers
 
SFO15-503: Secure storage in OP-TEE
SFO15-503: Secure storage in OP-TEESFO15-503: Secure storage in OP-TEE
SFO15-503: Secure storage in OP-TEE
 
8. Software Development Security
8. Software Development Security8. Software Development Security
8. Software Development Security
 
HKG18-402 - Build secure key management services in OP-TEE
HKG18-402 - Build secure key management services in OP-TEEHKG18-402 - Build secure key management services in OP-TEE
HKG18-402 - Build secure key management services in OP-TEE
 
Programming language design and implemenation
Programming language design and implemenationProgramming language design and implemenation
Programming language design and implemenation
 
Advanced C
Advanced C Advanced C
Advanced C
 
Shaping the Future of Automatic Programming
Shaping the Future of Automatic ProgrammingShaping the Future of Automatic Programming
Shaping the Future of Automatic Programming
 
Important adb commands
Important adb commandsImportant adb commands
Important adb commands
 
Threat Intelligence
Threat IntelligenceThreat Intelligence
Threat Intelligence
 
Kali Linux Hakkında Herşey
Kali Linux Hakkında HerşeyKali Linux Hakkında Herşey
Kali Linux Hakkında Herşey
 

Semelhante a NLP Basics with Natural JavaScript Library

Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generatorsPaul Kahoro
 
Natural Language Processing Tools for the Digital Humanities
Natural Language Processing Tools for the Digital HumanitiesNatural Language Processing Tools for the Digital Humanities
Natural Language Processing Tools for the Digital HumanitiesXiang Li
 
தமிழ்க்கணிமை கட்டமைப்பு
தமிழ்க்கணிமை கட்டமைப்புதமிழ்க்கணிமை கட்டமைப்பு
தமிழ்க்கணிமை கட்டமைப்புBalaSundaraRaman (Sundar)
 
The Holistic Programmer
The Holistic ProgrammerThe Holistic Programmer
The Holistic ProgrammerAdam Keys
 
Os Keysholistic
Os KeysholisticOs Keysholistic
Os Keysholisticoscon2007
 
Preventing Complexity in Game Programming
Preventing Complexity in Game ProgrammingPreventing Complexity in Game Programming
Preventing Complexity in Game ProgrammingYaser Zhian
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013Iván Montes
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 
CoreML for NLP (Melb Cocoaheads 08/02/2018)
CoreML for NLP (Melb Cocoaheads 08/02/2018)CoreML for NLP (Melb Cocoaheads 08/02/2018)
CoreML for NLP (Melb Cocoaheads 08/02/2018)Hon Weng Chong
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...Apache OpenNLP
 
Polyglot Architecture: A Rational Approach to Software Design
Polyglot Architecture: A Rational Approach to Software DesignPolyglot Architecture: A Rational Approach to Software Design
Polyglot Architecture: A Rational Approach to Software Designkompalg
 
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...gagravarr
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingTyrone Systems
 
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...gagravarr
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch BasicsShifa Khan
 

Semelhante a NLP Basics with Natural JavaScript Library (20)

Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generators
 
Natural Language Processing Tools for the Digital Humanities
Natural Language Processing Tools for the Digital HumanitiesNatural Language Processing Tools for the Digital Humanities
Natural Language Processing Tools for the Digital Humanities
 
தமிழ்க்கணிமை கட்டமைப்பு
தமிழ்க்கணிமை கட்டமைப்புதமிழ்க்கணிமை கட்டமைப்பு
தமிழ்க்கணிமை கட்டமைப்பு
 
The Holistic Programmer
The Holistic ProgrammerThe Holistic Programmer
The Holistic Programmer
 
Os Keysholistic
Os KeysholisticOs Keysholistic
Os Keysholistic
 
Build your own ASR engine
Build your own ASR engineBuild your own ASR engine
Build your own ASR engine
 
Antlr Conexaojava
Antlr ConexaojavaAntlr Conexaojava
Antlr Conexaojava
 
Preventing Complexity in Game Programming
Preventing Complexity in Game ProgrammingPreventing Complexity in Game Programming
Preventing Complexity in Game Programming
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
NLP PPT.pptx
NLP PPT.pptxNLP PPT.pptx
NLP PPT.pptx
 
CoreML for NLP (Melb Cocoaheads 08/02/2018)
CoreML for NLP (Melb Cocoaheads 08/02/2018)CoreML for NLP (Melb Cocoaheads 08/02/2018)
CoreML for NLP (Melb Cocoaheads 08/02/2018)
 
Taming Text
Taming TextTaming Text
Taming Text
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
 
Polyglot Architecture: A Rational Approach to Software Design
Polyglot Architecture: A Rational Approach to Software DesignPolyglot Architecture: A Rational Approach to Software Design
Polyglot Architecture: A Rational Approach to Software Design
 
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language Processing
 
Nltk
NltkNltk
Nltk
 
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 

Mais de Aniruddha Chakrabarti

Thomas Cook and Accenture expand relationship with 10 year technology consult...
Thomas Cook and Accenture expand relationship with 10 year technology consult...Thomas Cook and Accenture expand relationship with 10 year technology consult...
Thomas Cook and Accenture expand relationship with 10 year technology consult...Aniruddha Chakrabarti
 
Golang - Overview of Go (golang) Language
Golang - Overview of Go (golang) LanguageGolang - Overview of Go (golang) Language
Golang - Overview of Go (golang) LanguageAniruddha Chakrabarti
 
Amazon alexa - building custom skills
Amazon alexa - building custom skillsAmazon alexa - building custom skills
Amazon alexa - building custom skillsAniruddha Chakrabarti
 
Using Node-RED for building IoT workflows
Using Node-RED for building IoT workflowsUsing Node-RED for building IoT workflows
Using Node-RED for building IoT workflowsAniruddha Chakrabarti
 
Mphasis Digital - Use Go (gloang) for system programming, distributed systems...
Mphasis Digital - Use Go (gloang) for system programming, distributed systems...Mphasis Digital - Use Go (gloang) for system programming, distributed systems...
Mphasis Digital - Use Go (gloang) for system programming, distributed systems...Aniruddha Chakrabarti
 
Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)
Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)
Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)Aniruddha Chakrabarti
 
Future of .NET - .NET on Non Windows Platforms
Future of .NET - .NET on Non Windows PlatformsFuture of .NET - .NET on Non Windows Platforms
Future of .NET - .NET on Non Windows PlatformsAniruddha Chakrabarti
 
Mphasis Digital POV - Emerging Open Standard Protocol stack for IoT
Mphasis Digital POV - Emerging Open Standard Protocol stack for IoTMphasis Digital POV - Emerging Open Standard Protocol stack for IoT
Mphasis Digital POV - Emerging Open Standard Protocol stack for IoTAniruddha Chakrabarti
 

Mais de Aniruddha Chakrabarti (20)

Pinecone Vector Database.pdf
Pinecone Vector Database.pdfPinecone Vector Database.pdf
Pinecone Vector Database.pdf
 
Mphasis-Annual-Report-2018.pdf
Mphasis-Annual-Report-2018.pdfMphasis-Annual-Report-2018.pdf
Mphasis-Annual-Report-2018.pdf
 
Thomas Cook and Accenture expand relationship with 10 year technology consult...
Thomas Cook and Accenture expand relationship with 10 year technology consult...Thomas Cook and Accenture expand relationship with 10 year technology consult...
Thomas Cook and Accenture expand relationship with 10 year technology consult...
 
Dart programming language
Dart programming languageDart programming language
Dart programming language
 
Third era of computing
Third era of computingThird era of computing
Third era of computing
 
Golang - Overview of Go (golang) Language
Golang - Overview of Go (golang) LanguageGolang - Overview of Go (golang) Language
Golang - Overview of Go (golang) Language
 
Amazon alexa - building custom skills
Amazon alexa - building custom skillsAmazon alexa - building custom skills
Amazon alexa - building custom skills
 
Using Node-RED for building IoT workflows
Using Node-RED for building IoT workflowsUsing Node-RED for building IoT workflows
Using Node-RED for building IoT workflows
 
Mphasis Digital - Use Go (gloang) for system programming, distributed systems...
Mphasis Digital - Use Go (gloang) for system programming, distributed systems...Mphasis Digital - Use Go (gloang) for system programming, distributed systems...
Mphasis Digital - Use Go (gloang) for system programming, distributed systems...
 
Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)
Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)
Using Swift for all Apple platforms (iOS, watchOS, tvOS and OS X)
 
Future of .NET - .NET on Non Windows Platforms
Future of .NET - .NET on Non Windows PlatformsFuture of .NET - .NET on Non Windows Platforms
Future of .NET - .NET on Non Windows Platforms
 
CoAP - Web Protocol for IoT
CoAP - Web Protocol for IoTCoAP - Web Protocol for IoT
CoAP - Web Protocol for IoT
 
Groovy Programming Language
Groovy Programming LanguageGroovy Programming Language
Groovy Programming Language
 
Mphasis Digital POV - Emerging Open Standard Protocol stack for IoT
Mphasis Digital POV - Emerging Open Standard Protocol stack for IoTMphasis Digital POV - Emerging Open Standard Protocol stack for IoT
Mphasis Digital POV - Emerging Open Standard Protocol stack for IoT
 
Level DB - Quick Cheat Sheet
Level DB - Quick Cheat SheetLevel DB - Quick Cheat Sheet
Level DB - Quick Cheat Sheet
 
Lisp
LispLisp
Lisp
 
Overview of CoffeeScript
Overview of CoffeeScriptOverview of CoffeeScript
Overview of CoffeeScript
 
memcached Distributed Cache
memcached Distributed Cachememcached Distributed Cache
memcached Distributed Cache
 
Redis and it's data types
Redis and it's data typesRedis and it's data types
Redis and it's data types
 
pebble - Building apps on pebble
pebble - Building apps on pebblepebble - Building apps on pebble
pebble - Building apps on pebble
 

Último

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 

Último (20)

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 

NLP Basics with Natural JavaScript Library

  • 1. Basic Natural Language Processing using Natural (JavaScript/Node) Library Aniruddha Chakrabarti AVP and Chief Architect, Digital, Mphasis @anchakra | Linkedin.com/in/aniruddhac | slideshare.net/aniruddha.chakrabarti/
  • 2. Agenda • Emergence of Artificial Intelligence, AI First • What is Natural Language Processing (NLP) • Natural JavaScript/Node NLP Library • Tokenization - Word Tokenizer • Stemming and Lemmatization • String Distance • Inflectors • Phonetics • N-Grams • Classifier • tf-idf • POS Tagger • Spell Check
  • 3. → Turing Machine → Automating manual processes, tabulating data → Reducing manual effort and time → IBM System/360 (S/360), Mainframes, AS/400 → Computing Power (Moore’s Law) → Systems need to be explicitly programmed using explicit logic and rules. Pre programmed → Personal Computers (PCs), Communication (Networked PCs, Client/Server, Internet, WWW) → Automating business processes → Mostly structured data → Systems that learn from historical data and can make predictions. Not rule based system. → Uses Machine Learning, NLP to analyze unstructured data (text, image, audio, video) → Predictive Analytics, Deep Learning, Neural Nets, → OCR, Speech recognition, Text to speech, Face recognition, Video analysis, … → Cognitive Services (pay as you go model) – IBM Watson, Microsoft Cognitive Services, … → Robotics, Internet of Things, Conversational Systems, Wearables, Blur of physical & virtual → Still mostly Weak AI / Narrow AI Third Era of Computing * - AI First/AI Everywhere (Cognitive Systems) * From “The Computing Universe” by Tony Hey and Gyuri Papav → Strong AI / Full AI → Artificial General Intelligence (AGI) Tabulating Machines 1960 – 1980 Programmable Systems 1980 - 2010 AI First/AI Everywhere (Cognitive Systems) 2010 - Current Real AI ? ? AI Winter AI Summer • Artificial Intelligence has emerged as the third era of computing after tabulating machine and programmable systems.
  • 4. Gartner Hype Cycle … 2017 • AI technologies like Cognitive Computing, Virtual Assistants/Chatbot, Conversational AI, Machine Learning, Deep Learning and Autonomous Vehicles appear at the peak in Gartner Hype Cycle of Emerging Technologies, 2017. • Reinforcement Learning and Artificial General Intelligence (AGI) has appeared at the starting points of hype cycle – they are expected to peak in coming years.
  • 5. Emergence of “AI Everywhere” Gartner recons AI as one of the three mega trends. AI technologies like Conversational UI, Machine Learning, Deep Learning and Cognitive Computing constitutes “AI Everywhere”
  • 6. What is Natural Language Processing? • Field of computer science, artificial intelligence and computational linguistics concerned with the interactions between computers and human (natural) languages, and, in particular, concerned with programming computers to fruitfully process large natural language corpora – Wikipedia • Broadly categorized into two areas - ▪ Natural Language Understanding (NLU) ▪ Natural Language Generation (NLG) Natural Language Processing (NLP) Natural Language Understanding (NLU) Natural Language Generation (NLG)
  • 7. Some applications of NLP • Spell correction (MS Word/ any other editor) • Search engines (Google, Bing, Yahoo, wolfram alpha) • Speech engines (Siri, Google Voice, Cortana) • Personal Voice Assistants (Amazon Alexa, Google Home, …) • Spam classifiers (All e-mail services) • News feeds (Google, Yahoo!, and so on) • Machine translation (Google Translate, and so on) • Chatbots, Intelligent Virtual Agent/IVA • IBM Watson, Microsoft LUIS, Amazon Lex/Alexa
  • 8. NLP Tools & Libraries • GATE • Mallet (Java) • Open NLP – Apache (Java) • UIMA • CoreNLP - Stanford CoreNLP toolkit (Java) • Genism • Natural Language Toolkit / NLTK (Python) – by far the most popular NLP library & tool • spaCy (Python) – built on top of NLTK • TextBlob • Natural Library (JavaScript/Node) NLTK
  • 9. What is Natural • "Natural" is a general natural language processing library for nodejs. • Supports basic NLP tasks like tokenizing, stemming, classification, phonetics, tf-idf, WordNet, string similarity, inflections • At the moment, most of the algorithms are English-specific • Created by Chris Umbel • Loosely based on NLTK (Python) NLP Library • https://github.com/NaturalNode/natural • http://www.chrisumbel.com/article/node_js_natural_language_porter_stemmer_lancaster_baye s_naive_metaphone_soundex
  • 10. Natural library install and setup • Install using npm (Package manager for Node), use –g switch (for global installation) • Include the Natural package through require npm install –g natural // include the natural library let Natural = require('natural');
  • 11. Tokenization • A word (Token) is the minimal unit that a machine can understand and process. • Tokenization is the process of splitting the raw string into meaningful tokens • Raw text cannot be further processed without going through tokenization. • Complexity of tokenization varies according to the need of the NLP application, and the complexity of the language itself. ▪ In English it can be as simple as choosing only words and numbers through a regular expression. But for Chinese and Japanese, it will be a very complex task. • Two primary types of tokenizers: ▪ Word Tokenizer: Tokenizes raw text to words ▪ Sentence Tokenizer: Tokenizes raw text to sentences
  • 12. Word Tokenizer • A word (Token) is the minimal unit that a machine can understand & process • Tokenization is the process of splitting the raw string into meaningful tokens – Tokenizer tokenizes or splits raw text into words • Natural comes with multiple tokenizers - ▪ Word Tokenizer: a tokenizer that divides a text into sequences of alphabetic and numeric characters. (Ignores punctuation) ▪ Word Punct Tokenizer: Word + punctuation tokenizer. A tokenizer that divides a text into sequences of alphabetic and non-alphabetic characters. ▪ Treebank Word Tokenizer: uses regular expressions to tokenize text as in Penn Treebank ▪ Regexp Tokenizer: Tokenizes text using regular expression patterns. ▪ Aggressive Tokenizer:
  • 13. Word Tokenizer (Cont’d) var sentence = "Hello, how are you? I don't know you!" var wordTokenizer = new Natural.WordTokenizer(); var tokens = wordTokenizer.tokenize(sentence); console.log(tokens); // prints [ 'Hello', 'how', 'are', 'you', 'I', 'don', 't', 'know', 'you' ] var tokenizer = new Natural.WordPunctTokenizer(); var tokens = tokenizer.tokenize(sentence); console.log(tokens); // prints [ 'Hello', ', ', 'how', 'are', 'you', '? ', 'I', 'don', '‘’, // 't’, 'know', 'you', '!' ] var tokenizer = new Natural. TreebankWordTokenizer(); var tokens = tokenizer.tokenize(sentence); console.log(tokens); // prints [ 'Hello', ', ', 'how', 'are', 'you', '? ', 'I', 'don', '‘’, // 't’, 'know', 'you', '!' ] console.log(new Natural.AgressiveTokenizer().tokenize(sentence)); // prints ['Hello', 'how', 'are', 'you', 'I', 'don', 't', 'know', 'you' ]
  • 14. Stemming • Process of reducing inflected or derived words to their word stem, base or root form. • Similar to cutting down the branches of a tree to its stem • More of a crude rule-based process by which we want to club together different variations of the token – rule based • Removes –s/es or -ing or -ed eating, eats, eaten, eat -> eat stopping, stopped, stops, stop -> stop ate -> ate (wrong should be eat)
  • 15. Stemming (Cont’d) • Different stemming algorithms - ▪ Lovins Stemmer - First published stemmer was written by Julie Beth Lovins in 1968. Lovins Stemmer is not used currently. ▪ Porter Stemmer - Written by Martin Porter and in July 1980. Very widely used and became the de facto standard algorithm used for English stemming. ▪ Lancaster Stemmer - Paice/Husk stemmer developed at Lancaster University. The stemmer, although remaining efficient and easily implemented, is known to be very strong and aggressive. The stemmer utilizes a single table of rules, each of which may specify the removal or replacement of an ending. ▪ Snowball Stemmer – Also called Porter2 stemmer, since this is an updated version of original Porter Stemmer. Natural does not support Snowball Stemmer • Lemmatization is a more robust and methodical way of combining grammatical variations to the root of a word. ▪ Natural does not support any Lemmatization algorithm. ▪ NLTK and other matured NLP libraries support Lemmatization
  • 16. Stemming – Porter Stemmer and Lancaster Stemmer var porterStemmer = Natural.PorterStemmer; console.log(porterStemmer.stem("ate")); // prints at console.log(porterStemmer.stem("eating")); // prints eat console.log(porterStemmer.stem("eats")); // prints eat console.log(porterStemmer.stem("eat")); // prints eat console.log(porterStemmer.stem("agreement")); // prints agreement var lancasterStemmer = Natural.LancasterStemmer; console.log(lancasterStemmer.stem("ate")); // prints at console.log(lancasterStemmer.stem("eating")); // prints eat console.log(lancasterStemmer.stem("eats")); // prints eat console.log(lancasterStemmer.stem("eat")); // prints eat console.log(lancasterStemmer.stem("agreement")); // prints agr • Natural supports Porter Stemmer and Lancaster Stemmer only. It does not support Snowball Stemmer. • Both the stemmers provide a stem method
  • 17. Stemming – Porter Stemmer (Non English languages) • Natural supports Porter Stemmer in Non English languages also • Following languages are supported - ▪ Farsi - PorterStemmerFa ▪ French - PorterStemmerFr ▪ Russian - PorterStemmerRu ▪ Spanish - PorterStemmerEs ▪ Italian - PorterStemmerIt ▪ PorterStemmerNo ▪ Swedish - PorterStemmerSv ▪ PorterStemmerPt
  • 18. Lemmatization • More methodical way of converting all the grammatical/inflected forms of the root of the word. • Uses context and part of speech to determine the inflected form of the word and applies different normalization rules for each part of speech to get the root word (lemma) • Natural NLP library does not support Lemmatization.
  • 19. Inflector • Inflectors are used to pluralize or singularize words • There are different types of Inflectors available in Natural Library ▪ Noun Inflector: pluralize or singularize nouns only ▪ Verb Inflector: Verbs can be pluralized/singularized with a Verb Inflector. Natural provides a inflector called PresentVerbInflector which works on Present Tense Verbs only ▪ Both noun and verb inflector provides singularize and pluralize methods ▪ Number or Count Inflector: Ordinal numbers could be formed from normal number ▪ Provides a single method called nth which returns the ordinal form of any number passed
  • 20. Inflector (Cont’d) // pluralize or singularize nouns only var nounInflector = new Natural.NounInflector(); console.log(nounInflector.pluralize("Book")); // prints Books console.log(nounInflector.pluralize("radius")); // prints radii console.log(nounInflector.singularize("flies")); // prints fly console.log(nounInflector.singularize("men")); // prints man var countInflector = Natural.CountInflector; console.log(countInflector.nth("1")); // prints 1st console.log(countInflector.nth("2")); // prints 2nd console.log(countInflector.nth("3")); // prints 3rd console.log(countInflector.nth("4")); // prints 4th console.log(countInflector.nth("10")); // prints 10th var verbInflector = new Natural.PresentVerbInflector(); console.log(verbInflector.singularize("go")); // prints goes console.log(verbInflector.singularize("run")); // prints runs console.log(verbInflector.pluralize("becomes")); // prints become console.log(verbInflector.pluralize("presents")); // prints present
  • 21. N-Grams • an n-gram is a contiguous sequence of n items from a given sample of text or speech. • The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus. • When the items are words, n-grams may also be called shingles • An n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram"; size 3 is a "trigram". • Larger sizes are sometimes referred to by the value of n in modern language, e.g., "four- gram", "five-gram", and so on. Hello how are you Hello how how are are you bigram Hello how are you Hello how are how are you trigram Hello how are you Hello unigram how are you
  • 22. N-Grams (Cont’d) var sentence = "Hello how are you"; var ngrams = Natural.NGrams; console.log(ngrams.bigrams(sentence)); // prints [ [ 'Hello', 'how' ], [ 'how', 'are' ], [ 'are', 'you' ] ] console.log(ngrams.trigrams(sentence)); // prints [ [ 'Hello', 'how', 'are' ], [ 'how', 'are', 'you' ] ] console.log(ngrams.ngrams(sentence, 1)); // unigram //prints [ [ 'Hello' ], [ 'how' ], [ 'are' ], [ 'you' ] ] sentence = "NLTK is a Natural Language Processing Library in Nodejs"; console.log(ngrams.ngrams(sentence, 4)); // four-gram prints [ [ 'NLTK', 'is', 'a', 'Natural' ], [ 'is', 'a', 'Natural', 'Language' ], [ 'a', 'Natural', 'Language', 'Processing' ], [ 'Natural', 'Language', 'Processing', 'Library' ], [ 'Language', 'Processing', 'Library', 'in' ], [ 'Processing', 'Library', 'in', 'Nodejs' ] ]
  • 23. Phonetics • A phonetic algorithm is an algorithm for indexing of words by their pronunciation. • A phonetic matching algorithm is an algorithm that matches word by their pronunciation rather than spelling. • Most phonetic algorithms were developed for use with the English language. Consequently, applying the rules to words in other languages might not give a meaningful result. • Some of the well known phonetics algorithms are – ▪ Soundex - Developed to encode surnames for use in censuses. Soundex codes are four- character strings composed of a single letter followed by three numbers. ▪ Daitch–Mokotoff Soundex - Refinement of Soundex designed to better match surnames of Slavic & Germanic origin. Daitch–Mokotoff Soundex codes are strings composed of six numeric digits. ▪ Cologne phonetics - Similar to Soundex, but more suitable for German words. ▪ Metaphone, Double Metaphone, and Metaphone 3 - Suitable for use with most English words, not just names. Metaphone algorithms are basis for many popular spell checkers. ▪ New York State Identification and Intelligence System (NYSIIS) - Maps similar phonemes to the same letter. The result is a string that can be pronounced by the reader without decoding. ▪ Match Rating Approach developed by Western Airlines in 1977 - this algorithm has an encoding and range comparison technique. ▪ Caverphone, created to assist in data matching between late 19th century and early 20th century electoral rolls, optimized for accents present in parts of New Zealand.
  • 24. Phonetics Matching (Cont’d) • Natural supports Phonetic Matching using three algorithms – ▪ SoundEx ▪ Metaphone ▪ DoubleMetaphone var metaphone = Natural.Metaphone; var soundex = Natural.SoundEx; var doubleMetaphone = Natural.DoubleMetaphone; // using SoundEx for phonetic matching console.log(soundex.compare("nuremberg", "nuremburg")); // returns true console.log(soundex.compare("Paris", "Pari")); // returns false // using Metaphone for phonetic matching console.log(metaphone.compare("Fool", "Full")); // returns true console.log(metaphone.compare("Fool", "Failed")); // returns false // using Double Metaphone for phonetic matching console.log(doubleMetaphone.compare("Bangalore", "Bengaluru")); // returns true console.log(doubleMetaphone.compare("Mumbai", "Bombay")); // returns false
  • 25. String Distance • String Distance measures how closely two strings match. • Natural provides JaroWinkler Distance and Levenshtein Distance algorithms for String Distance match JaroWinkler Distance • Jaro distance between two words is the minimum number of single-character transpositions required to change one word into the other. • It is a variant proposed in 1990 by William E. Winkler of the Jaro distance metric (1989, Matthew A. Jaro). • Returns a number between 0 and 1 which tells how closely the strings match (0 = no match, 1 = exact match) // Using JaroWrinkler Distance algorithm console.log(Natural.JaroWinklerDistance("Hello", "Hello")); // returns 1: exact match console.log(Natural.JaroWinklerDistance("Me", "You")); // returns 0: no match console.log(Natural.JaroWinklerDistance("Bangalore", "Bengaluru")); // returns 0.72: partial match console.log(Natural.JaroWinklerDistance("Mumbai", "Bombay")); // returns 0.66: partial match
  • 26. String Distance - Levenstein Distance • Levenstein Distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. • Named after the Soviet mathematician Vladimir Levenshtein, who considered this distance in 1965 • Also be referred as edit distance // Using Levenshtein Distance algorithm console.log(Natural.LevenshteinDistance("Hello", "Hello")); // 0 console.log(Natural.LevenshteinDistance("Bangalore", "Bengaluru")); // 3 console.log(Natural.LevenshteinDistance("Mumbai", "Bombay")); // 3 console.log(Natural.LevenshteinDistance("Chennai", "Madras")); // 6 console.log(Natural.LevenshteinDistance("Nuremberg", "Nuremburg")); // 1 B a n g a l o r e B e n g a l u r u 3 character change N u r e m b e r g N u r e m b u r g 1 character change
  • 27. tf-idf • tf–idf or TFIDF is short for term frequency - inverse document frequency • tf-idf determines how important a word (or words) is to a document relative to a corpus. • Often used as weighting factor in searches of information retrieval, text mining & user modeling. • The tf-idf value increases proportionally to the number of times a word appears in the document and is offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general. • tfidf method returns the measure of importance of a word var tfidf = new Natural.TfIdf(); // Documents could be added to tf-idf. Here only a single doc is added, but more could be added tfidf.addDocument("this document is about node. Its also about NLP. Node is used for it"); // Find out the tf-idf of different words in the document console.log(tfidf.tfidf("node", 0)); // prints 0.61 as node appears multiple times in the doc console.log(tfidf.tfidf("NLP", 0)); // prints 0.30 as NLP appears only single time console.log(tfidf.tfidf("ruby", 0)); // prints 0 as ruby does not appear in the doc console.log(tfidf.listTerms(0)); [ { term: 'node', tfidf: 0.6137056388801094 }, { term: 'document', tfidf: 0.3068528194400547 }, { term: 'nlp', tfidf: 0.3068528194400547 }, { term: 'used', tfidf: 0.3068528194400547 } ]
  • 28. tf-idf (cont’d) • Disc files could also be added to tf-idf • Multiple documents could be added to tf-idf var tfidf = new Natural.TfIdf(); // Adding files from disc to tfidf tfidf.addFileSync("C:/Data/Profile.txt"); console.log(tfidf.listTerms(0)); // Multiple documents added to tdidf which forms the entire corpus tfidf.addDocument('this document is about node. Its also about NLP. Node is used for it'); tfidf.addDocument('this document is about ruby.'); tfidf.addDocument('this document is about ruby and node.'); console.log(tfidf.tfidf("node", 0)); // prints 2 console.log(tfidf.tfidf("NLP", 0)); // prints 1.40 console.log(tfidf.tfidf("ruby", 0)); // prints 0 console.log(tfidf.tfidf("node", 1)); // prints 0 as node does not appear in 2nd doc console.log(tfidf.tfidf("ruby", 1)); // prints 1 as ruby appears in 2nd doc console.log(tfidf.tfidf("node", 2)); // prints 1 as node appears in 3rd doc console.log(tfidf.tfidf("ruby", 2)); // prints 1 as ruby appears in 3rd doc
  • 29. tf-idf (cont’d) • tfidf method returns the measure of importance of a word in various documents • tfidf method accepts the word and a callback // Multiple documents added to tdidf which forms the entire corpus tfidf.addDocument('this document is about node. Its also about NLP. Node is used for it'); tfidf.addDocument('this document is about ruby.'); tfidf.addDocument('this document is about ruby and node.’); // tfidfs method is used to find the importance of the word across multiple documents tfidf.tfidfs('node', function(ctr, measure){ console.log('tf-idf of node in document #' + ctr + ' is ' + measure); });
  • 30. POS (Part of Speech) Tagging • Process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context—i.e., its relationship with adjacent and related words in a phrase, sentence, or paragraph. • Also called grammatical tagging or word-category disambiguation,
  • 31. POS (Part of Speech) Tagging • Current state of the art POS tagging algorithms can predict the POS of the given word with a higher degree of precision (that is approximately 97%). But still lots of research going on in the area of POS tagging. No Tag Description 1. CC Coordinating conjunction 2. CD Cardinal number 3. DT Determiner 4. EX Existential there 5. FW Foreign word 6. IN Preposition or subordinating conjunction 7. JJ Adjective 8. JJR Adjective, comparative 9. JJS Adjective, superlative 10. LS List item marker 11. MD Modal 12. NN Noun, singular or mass 13. NNS Noun, plural 14. NNP Proper noun, singular 15. NNPS Proper noun, plural 16. PDT Predeterminer 17. POS Possessive ending 18. PRP Personal pronoun No Tag Description 19. PRP$ Possessive pronoun 20. RB Adverb 21. RBR Adverb, comparative 22. RBS Adverb, superlative 23. RP Particle 24. SYM Symbol 25. TO to 26. UH Interjection 27. VB Verb, base form 28. VBD Verb, past tense 29. VBG Verb, gerund or present participle 30. VBN Verb, past participle 31. VBP Verb, non-3rd person singular present 32. VBZ Verb, 3rd person singular present 33. WDT Wh-determiner 34. WP Wh-pronoun 35. WP$ Possessive wh-pronoun 36. WRB Wh-adverb
  • 32. POS Tagging – Brill POS Tagger • Natural supports POS tagging through Brill POS Tagger that implements Eric Brill's transformational algorithm (transformation rules are specified in external files). • E. Brill's tagger, most widely used English POS-taggers, employs rule-based algorithms. // Path where natural library is located var baseFolder = path.join(path.dirname(require.resolve("natural")), "brill_pos_tagger"); // Rules file located in /data/<language> sub folder under natural library var rulesFilename = baseFolder + "/data/English/tr_from_posjs.txt"; // Lexicon file located in /data/<language> sub folder under natural library var lexiconFilename = baseFolder + "/data/English/lexicon_from_posjs.json"; var defaultCategory = 'N'; var lexicon = new Natural.Lexicon(lexiconFilename, defaultCategory); var rules = new Natural.RuleSet(rulesFilename); // Any tagger needs lexicon and rules for successful POS tagging of words // Brill POS Tagger object is created passing lexicon file and rules file location var tagger = new Natural.BrillPOSTagger(lexicon, rules); var sentence = "I see the man with the telescope"; var tokenizer = new Natural.WordTokenizer(); // tokenize the sentence to tokens var tokens = tokenizer.tokenize(sentence); console.log(tagger.tag(tokens)); [ [ 'I', 'NN' ], [ 'see', 'VB' ], [ 'the', 'DT' ], [ 'man', 'NN' ], [ 'with', 'IN' ], [ 'the', 'DT' ], [ 'telescope', 'NN' ] ]