Hackathon 2014 NLP Hack

VionLabs Movie&TV Hackathon @ 24 - 26 / 01 /14

Movie Hack Attack !

Roelof Pieters!
roelof@vionlabs.com

Aim of the Hack
(As it’s really only a small hack, the real aim is slightly less grand: 
playing with Python as a Rest Api/Server and make some nice stuff in a
couple of days… and not spend time on that wasteful activity called “sleep”)
❖

Provide extra content information to users while
watching a Movie or TV Show:!
❖

Moods and Sentiment of a Movie!

❖

Persons and Places mentioned or featuring in a
Movie

and now… The Hack!

Persons/Locations/ 
Organizations
Objects/Expressions
In this case the sentence
“mandarin of television” is
recognized as an Expression. The
word “Mandarin” is recognized,
as well as the word “Television”
and marching pictures are
searched and shown

in this case the word
“Howard Beale” is seen as a
“Person”

The Network (1976) [Movie]
 
Sentiment

current

Positive or Negative
Sentiment of each
Sentence

over time  
(you can see it needs more  
normalization for long video items)

Obama 2012 victory speech [Talk]
Persons/Locations/ 
Organizations
in this case the word
“America” is seen as the
“Location”, “USA”

, 
Emotions
in this case the word “Fall” is
seen as the emotion “Triumph”
(because of the context of the
sentence)

The Pictures are random matches  
on the speciﬁc concept and grabbed from  
Google Image/Flickr in Realtime

Language Tech
(Don’t worry, only 2 slides about tech… but C’mon its a Hackathon, isn’t it?)
❖

Analysis of Subtitles is done through:!
❖

Language pre- and post-Processing (tokenize,remove
stopwords,punctuation, etc) [nltk]!

❖

Part-Of-String Tagging (POS) for identifying the
Grammar of a sentence [nltk pos-tagger + Brown’s text
corpus]!

❖

Named Entity Tagging (NER) for identifying “objects”
of interest: Persons, Locations, Organizations
[Stanford’s ner tagger]

Language Tech (2)
❖

Analysis of Subtitles is done through:!
❖

Sentiment extraction through a trained sentiment
model, adapted/hacked to be more applicable for
movie data [Princeton's SentiWordNet (annotated
sentiment lexicon) + hack]!

❖

Matching Emotions through many different
techniques [Princeton’s Wordnet annotated sunset
lexicon, all previous steps, and many… many…
hacks]

Possible use cases
❖

Get sentiment and extracted emotional values from news broadcasts from different
channels (ie Al Jazeera, CNN, Russia Today) and get a quick indiction of their speciﬁc
viewpoint (or “bias”) of a “news event”;!

❖

Filter content by emotional thresholds (Today I only want to read “Happy” news/
items with a overall positive sentiment / emotional values;!

❖

Plug movie/video content into the Semantic Web by linking extracted subtitle entities/
chunks to their speciﬁc ontologies (adding ref header tags to movie information pages;!

❖

Enable a richer user interaction through adding extra meta information to existing
content and user interfaces;!

❖

Enable smart semantic (textual) searching for non-textual content through feature
extraction by some of the technologies showcased here.!

❖

(…)

Hackathon 2014 NLP Hack

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (20)

Semelhante a Hackathon 2014 NLP Hack

Semelhante a Hackathon 2014 NLP Hack (13)

Mais de Roelof Pieters

Mais de Roelof Pieters (19)

Último

Último (20)

Hackathon 2014 NLP Hack