SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
How to parse ‘go’
Natural Language Processing in Ruby
Tom Cartwright
@tomcartwrightuk
!

keepmebooked
giveaiddirect.com
Python, surely?
Yes. The NLTK is awesome.
But you have a Ruby-based app.
Extracting meaning from !
human input
Summarisation
Extracting entities
Tagging text
Sentiment analysis
Filtering text
document

sentence

From document level!
!
!
!
!

word

example

to word level
document

sentence

word

example

Chunking & segmenting
Breaking text into paragraphs, sentences and other zones
Start with a document/some text:
“The second nonabsolute number is the given time of
arrival, which is now known to be one of those most bizarre
of mathematical concepts, a recipriversexclusion, a number
whose existence can only be defined as being anything other
than itself…..”
document

sentence

word

Punkt sentence tokenizer to the rescue….

example
document

sentence

word

example

tokenizer = Punkt::SentenceTokenizer.new(!
"The second nonabsolute number is the given time
of arrival...")!
!

result = !
tokenizer.sentences_from_text(text,!
:output => :sentences_text)!
!
!
!
document

sentence

word

example

Training

trainer = Punkt::Trainer.new()!
trainer.train(bistromatic_text)
document

sentence

word

example

Tokenising
Breaking text into words, phrases and symbols.
“Time is an illusion. Lunchtime
doubly so.”.split(“ “)!
!

#=> !
!

[“Time", “is", “an", “illusion.”,
“Lunchtime", “doubly", “so.”]!
document

sentence

word

example

Tokenizer gem
Regexes and rules
class Tokenizer	
	
FS = Regexp.new(‘[[:blank:]]+')	
PAIR_PRE = ['(', '{', '[']	
SIMPLE_POST = ['!', '?', ',', ':', ';', '.']	
PAIR_POST = [')', '}', ']']	
PRE_N_POST = ['"', “'"]	
…
document

sentence

word

tokenizer = Tokenizer::Tokenizer.new
tokenizer.tokenize(“Time is an
illusion. Lunchtime doubly so.”)

#=>

[“Time", “is", “an", “illusion", “.”,
“Lunchtime", “doubly", “so", “.”]

example
document

sentence

word

example

Stemming
Jogging => Jog
“jogging”.gsub(/.ing/, “”) !
#=> “jog"!
!

“bring”.gsub(/.ing/, “”) !
#=> “b"
document

sentence

1. Ruby-Stemmer
2. Text

word

example

multi-language porter stemmer

porter stemmer

stemmer = Lingua::Stemmer.new(:language => "en")
stemmer.stem("programming") #=> program
stemmer.stem("vimming") #=> vim
document

sentence

word

example

Parts-of-speech tagging
CC

conjunction

DET

determiner

and, but
this, some

IN

preposition / conjunction

JJ

adjective

NNP

above, about

orange, tiny

proper noun

Camden Pale Ale
document

sentence

word

A couple of methods!
!

Regex tagger
/*.ing/
VBG
/*.ed/

VBD
!

Lookup on words
E.g.
calculating : { VBG: 6 }
orange: { JJ: 2, NN: 5 }

example
document

sentence

word

example

A tale of two taggers
EngTagger

rb-brill-tagger

Probabilistic (uses

•

Rule based

look up table prev.

•

•

C extensions

slide)
•

Brown corpus trained

•

Pure ruby
document

sentence

word

example

Treat gem
Bundles many of the gems shown
Wraps them in a DSL
s = sentence(“A really good sentence.”)
s.do(:chunk, :segment, :tokenize, :parse)

stemming; tokenising; chunking; serialising;
tagging; text extraction from pdfs and html;
LRUG Sentiments
A tag

{NN}

Pass in regex => /({JJ}|{JJS})({NNS}|{NNP})/
And some tagged tokens
#=> [(Word @tag="JJ", @text="jolly"),!
(Word @tag="NN", @text="face")]
Sentimental value
1.0
!
1.0
0.21875
0.21875
-1.0
-1.0

epic!
good!
chance!
brisk!
slanderous!
piteous
Results
!
!
!
•
•

•
•
•

Ruby!
Practical ObjectOriented Design in
Ruby!
Doctors!
Lrug!
recruiters (!)

•
•
•

dedicated servers!
pdfs!
Surrey

•

•
•
•
•

unsolicited phone
calls from
r********s!
clients!
Paypal!
XML!
geeks
Gems
Text - Paul Battley’s box of tricks
Treat
Tokenizer
Punkt segmenter
Chronic - for extracting dates
Other things you can do/I didn’t talk about
Calculate text edit distance
Extract entities using the Stanford
libraries via the RJB
!

Extract topic words (LDA)
!

Keyword extraction - TfIdf
!

Jruby
Thank you for processing.
Questions?
@tomcartwrightuk

Thanks to Tim Cowlishaw and the HT dev
team for specialised rubber duck support

Mais conteúdo relacionado

Mais procurados

Ruby Introduction
Ruby IntroductionRuby Introduction
Ruby Introduction
Prabu D
 
Programming languages vienna
Programming languages viennaProgramming languages vienna
Programming languages vienna
greg_s
 

Mais procurados (17)

Ruby Introduction
Ruby IntroductionRuby Introduction
Ruby Introduction
 
Etymology Markup in TEI XML
Etymology Markup in TEI XMLEtymology Markup in TEI XML
Etymology Markup in TEI XML
 
Ruby Hell Yeah
Ruby Hell YeahRuby Hell Yeah
Ruby Hell Yeah
 
NLP new words
NLP new wordsNLP new words
NLP new words
 
Week2
Week2Week2
Week2
 
Semana Interop: Trabalhando com IronPython e com Ironruby
Semana Interop: Trabalhando com IronPython e com IronrubySemana Interop: Trabalhando com IronPython e com Ironruby
Semana Interop: Trabalhando com IronPython e com Ironruby
 
Programming languages vienna
Programming languages viennaProgramming languages vienna
Programming languages vienna
 
Ruby monsters
Ruby monstersRuby monsters
Ruby monsters
 
Python2 unicode-pt1
Python2 unicode-pt1Python2 unicode-pt1
Python2 unicode-pt1
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
 
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
 
Kotlin L → ∞
Kotlin L → ∞Kotlin L → ∞
Kotlin L → ∞
 
Ruby
RubyRuby
Ruby
 
Go programing language
Go programing languageGo programing language
Go programing language
 
Intro to NLP. Lecture 2
Intro to NLP.  Lecture 2Intro to NLP.  Lecture 2
Intro to NLP. Lecture 2
 
Ruby Presentation
Ruby Presentation Ruby Presentation
Ruby Presentation
 
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
 

Destaque

Destaque (13)

Natural Language Processing and Python
Natural Language Processing and PythonNatural Language Processing and Python
Natural Language Processing and Python
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...
 
PG-Strom
PG-StromPG-Strom
PG-Strom
 
Google guava - almost everything you need to know
Google guava - almost everything you need to knowGoogle guava - almost everything you need to know
Google guava - almost everything you need to know
 
Patient matching in FHIR
Patient matching in FHIRPatient matching in FHIR
Patient matching in FHIR
 
Procesamiento de Lenguaje Natural, Python y NLTK
Procesamiento de Lenguaje Natural, Python y NLTKProcesamiento de Lenguaje Natural, Python y NLTK
Procesamiento de Lenguaje Natural, Python y NLTK
 
Evolution of Software Engineering in NCTR Projects
Evolution of Software Engineering in NCTR  Projects   Evolution of Software Engineering in NCTR  Projects
Evolution of Software Engineering in NCTR Projects
 
Codeception Testing Framework -- English #phpkansai
Codeception Testing Framework -- English #phpkansaiCodeception Testing Framework -- English #phpkansai
Codeception Testing Framework -- English #phpkansai
 
A Doctor’s Perspective on the Future Role of Pharmaceutical-Doctor Relationsh...
A Doctor’s Perspective on the Future Role of Pharmaceutical-Doctor Relationsh...A Doctor’s Perspective on the Future Role of Pharmaceutical-Doctor Relationsh...
A Doctor’s Perspective on the Future Role of Pharmaceutical-Doctor Relationsh...
 
Google guava overview
Google guava overviewGoogle guava overview
Google guava overview
 
NLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyNLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easy
 
NLTK in 20 minutes
NLTK in 20 minutesNLTK in 20 minutes
NLTK in 20 minutes
 
Building a distributed search system with Hadoop and Lucene
Building a distributed search system with Hadoop and LuceneBuilding a distributed search system with Hadoop and Lucene
Building a distributed search system with Hadoop and Lucene
 

Semelhante a Natural Language Processing in Ruby

Javascriptbootcamp
JavascriptbootcampJavascriptbootcamp
Javascriptbootcamp
oscon2007
 
Natural Language Processing made easy
Natural Language Processing made easyNatural Language Processing made easy
Natural Language Processing made easy
Gopi Krishnan Nambiar
 
Os Keysholistic
Os KeysholisticOs Keysholistic
Os Keysholistic
oscon2007
 
Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02
Ramamohan Chokkam
 
Ruby 1.9.3 Basic Introduction
Ruby 1.9.3 Basic IntroductionRuby 1.9.3 Basic Introduction
Ruby 1.9.3 Basic Introduction
Prabu D
 

Semelhante a Natural Language Processing in Ruby (20)

TechDays - IronRuby
TechDays - IronRubyTechDays - IronRuby
TechDays - IronRuby
 
Javascriptbootcamp
JavascriptbootcampJavascriptbootcamp
Javascriptbootcamp
 
Modern C++
Modern C++Modern C++
Modern C++
 
Natural Language Processing made easy
Natural Language Processing made easyNatural Language Processing made easy
Natural Language Processing made easy
 
Os Keysholistic
Os KeysholisticOs Keysholistic
Os Keysholistic
 
Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02
 
CL-NLP
CL-NLPCL-NLP
CL-NLP
 
Embed--Basic PERL XS
Embed--Basic PERL XSEmbed--Basic PERL XS
Embed--Basic PERL XS
 
The Holistic Programmer
The Holistic ProgrammerThe Holistic Programmer
The Holistic Programmer
 
Streams of information - Chicago crystal language monthly meetup
Streams of information - Chicago crystal language monthly meetupStreams of information - Chicago crystal language monthly meetup
Streams of information - Chicago crystal language monthly meetup
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
 
Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020
 
Ruby 1.9.3 Basic Introduction
Ruby 1.9.3 Basic IntroductionRuby 1.9.3 Basic Introduction
Ruby 1.9.3 Basic Introduction
 
Words in Code
Words in CodeWords in Code
Words in Code
 
Go language presentation
Go language presentationGo language presentation
Go language presentation
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout
 
Beyond the Style Guides
Beyond the Style GuidesBeyond the Style Guides
Beyond the Style Guides
 
Build a compiler using C#, Irony and RunSharp.
Build a compiler using C#, Irony and RunSharp.Build a compiler using C#, Irony and RunSharp.
Build a compiler using C#, Irony and RunSharp.
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Natural Language Processing in Ruby