Mais conteúdo relacionado Semelhante a Análise de sentimento durante a Copa usando Big Data (20) Análise de sentimento durante a Copa usando Big Data1. © 2014 IBM Corporation
IBM Research – Brazil
1
Análise de sentimento
durante a Copa usando Big Data
Alan Braz – IBM Research @alanbraz
2. © 2014 IBM Corporation
IBM Research – Brazil
2
Alan Braz
IBM Research – Brazil
Research Software Engineer
2002:2005 UNICAMP – BSc in Computer Science
2005aug:2005nov IBM GBS – Java developer intern
2005:2007 IBM GBS – Java developer (WWER)
2007:2010 IBM GBS – Technical leader (eAC)
2009:2012 IBM GBS – Agile coach and instructor (GenO)
2009:today Metrocamp – SE, RUP, Agile grad teacher
2010:2012 IBM GBS – Software Architect (Blue Community)
2009:2013 UNICAMP – MSc Agile Software Engineering
2013feb:today IBM Research Brazil as RSE
www.alanbraz.com.br
@alanbraz
3. © 2014 IBM Corporation
IBM Research – Brazil
Innovation and Comfort
3
Trial-and-Error:
– start-ups
RADICAL
INNOVATION
INNOVATION
Science-Based:
– scientific method
(empirical)
– logic deduction
(mathematics)
4. © 2014 IBM Corporation
IBM Research – Brazil
4
science-based innovation
5. © 2014 IBM Corporation
IBM Research – Brazil
The World is our Lab: 12 Labs Worldwide in 10 Countries
5
Almaden Watson China
Austin
Israel Japan
Switzerland
India
Ireland
Australia
Behavioral
Science Chemistry
Electrical
Engineering
Computer
Science
Africa
Materials
Science
Mathematical
Science Physics
Services
Science
IBM Research world-wide has 1600+ PhDs
with diversity of disciplines:
9. © 2014 IBM Corporation
IBM Research – Brazil
9
IBM Research - Brazil
Natural resources modeling,
analytics, and logistics.
Systems of engagement
and insights.
Social
Data
Analytics
Analytics and modeling of
social and human data
and applications.
Micro/nano- technologies
aimed at addressing
smarter planet challenges.
Smarter
Natural
Resources
Systems of
Engagement
and Insights
Smarter
Devices
São Paulo
Rio de Janeiro
A team of world class researchers in close connection to
the other 12 IBM Research labs an to the world’s best
scientific, academic, and development communities.
10. Five Factor Model
•Openness
•Conscientious
•Extroverted
•Agreeable
•Neuroticism
Ford’s 12 “Universal Needs”
•Structure
•Challenge
•Excitement
•Liberty
•Harmony
•Closeness
© 2014 IBM Corporation
IBM Research – Brazil
System U: Modeling People from Social Media
Five Values
•Self-transcendence
•Conservation
•Self-enhancement
•Hedonism
•Openness-to-Change
10
Social behaviors
e.g., when tweeting
Social behaviors
e.g., when tweeting
Five Factor Model
Openness
Conscientious
Extroverted
Agreeable
Neuroticism
Ford’s 12 “Universal Needs”
Structure
Challenge
•Excitement
•Liberty
•Harmony
•Closeness
•Practicality
•Self-expression
•Curiosity
• Ideals
• Love
•Stability
Five Values
Self-transcendence
Conservation
Self-enhancement
Hedonism
Openness-to-Change
11. © 2014 IBM Corporation
IBM Research – Brazil
Project: Social Media Behavior Simulation
Maira Gatti, Ana Appel, Claudio
Pinhanez, Rogério de Paula, Cicero
dos Santos, Alexander Rademaker,
Paulo Cavalin, Samuel Barbosa,
Daniel Gribel
Goal: to create a tool for
companies to explore the
impact and result of social
media actions through
simulation.
Applications:
exploration of effort size
11
and impact of marketing
campaigns;
determination of counter-information
measures in
viral media outbreaks.
Simulation of Obama/Romney Twitter
campaigns in the last month before election
in the last month before election
Romney’s Network
5.1M tweets
28,145 active
users
5,498 followers
Obama’s Network 23,856,961 followers
Romney’s Network 1,675,792 followers
Sample - Sept 22 to Oct 29, 2012
Obama’s Network
5.6M tweets
24,526 active users
3,594 followers
14. © 2014 IBM Corporation
IBM Research – Brazil
14
Ei! 194 Million Brazilians Helping their National Team’s Coach
An app made specifically for one person: Luiz
Felipe Scolari, coach of the Brazilian national
soccer team.
Ei! is an app that identifies, filters and analyzes all
the Twitter comments that Brazilians have made
during the games.
With the touch of a button, Scolari will know what
the country consensus is on:
At half time: which players the audience are liking
and hating, what changes should be made, which
tactics should be explored, what player needs to be
introduced…
After the game: his country’s perspective on how
the team, the players and his performance as a
coach.
15. Challenges
•Real-time issues
• Up to 5 million tweets per match
• Up to 20 thousands tweets per minute
• Texting x Writing: Casual language
• nao disse , Balotelli ia meter gol hoje , um golaço ainda , madero aquele negoo
• hora de colocar o Leandro né Felipão ? u.u
• vou ser repetitivo de novo , mas : na minha epoca de jovem torcedor da seleção
© 2014 IBM Corporation
IBM Research – Brazil
15
brasileira , brasil nao tomava gol de p### de chile não viu
• jah to vendo o Brasil faze nois passa vergonha na copa ! ! ! pq meu g-zuis ...
• acho q o ronaldinho tem que ser totula
• Com todo o respeito , Luis Fabiano , popcorn men hahahahaha beijo para quem
entendeu , pior piada ever ! Haha
16. © 2014 IBM Corporation
IBM Research – Brazil
16
Social Sentiment Analysis is Difficult
(CHEvATM) Diego costa merece errar por ter escolhido outra seleçao pra jogar
(BRAvITA) Itália perdendo o segundo jogador lesionado com TRINTA minutos de jogo.
Prandelli deve tá jogando o Football Manager 2013.
(BRAvITA) PAAAAAAAARTIU ASSISTIR JOGO DO Brazil!
(BRAvITA) Vacilo, Jô ia entrar e fazer mais um
(BRAvMEX) o que aconteceu com a seleção ? Pqp
(BRAvURU) no momento dançando show das poderosas de sutiã e short jeans
(RMAvATM) BALE AMOR FAÇA AQUELE LINDO GOL QUE PROMETEU PRA MIM
ONTEM A NOITE
(BRAvMEX) Brazil vai ganhando do México, vingando-se das Olimpíadas, num jogo que
vale tanto quanto troco em bala.
(SAOvCOR) o ganso so quer fazer jogada genial
(SAOvCOR) Com essa Fabulosa em campo o Sao Paulo sempre vai fazer gol contra o
Corinthians, entenda tecnico retranqueiro do c#######
(SAOvCOR) Mano meu pai ganho 500 conto no jogo do bixo kkkk
17. © 2014 IBM Corporation
IBM Research – Brazil
17
Ei! Social Sentiment Solution
18. © 2014 IBM Corporation
IBM Research – Brazil
18
Algorithmic
Trading
Powerful
Analytics
Millions of
events per
second
Microsecond
Latency
Real time delivery
Traditional / Non-traditional
data sources
Telco Churn
Prediction
Smart
Grid
Cyber
Security
Government /
Law enforcement
ICU
Monitoring
Environment
Monitoring
InfoSphere Streams
A Platform for Real Time Analytics on BIG Data
Key Big Data Challenge – Velocity
Volume:
Terabytes per second
Petabytes per day
Variety:
All kinds of data
All kinds of analytics
Velocity:
Insights in microseconds
19. © 2014 IBM Corporation
IBM Research – Brazil
19 http://www.ibm.com/developerworks/analytics/
20. © 2014 IBM Corporation
IBM Research – Brazil
20
21. © 2014 IBM Corporation
IBM Research – Brazil
Streams Runtime Illustrated
21
Optimizing scheduler assigns PEs
to hosts, and continually manages
resource allocation
Commodity hardware – laptop,
blades or high performance clusters
Meters
Company
Filter
Usage
Model
Usage
Contract
Temp
Action
x86 host x86 host x86 host x86 host x86 host
Dynamically add
hosts and jobs
New jobs work
with existing jobs
Text
Extract
Degree
History
Compare
History Store
History
Meters
Season
Adjust
Daily
Adjus
t
Text
Extract
22. Ei! is Built on FAMA: Real-Time Social Media Polarity Analysis Tool for
Portuguese Language
© 2014 IBM Corporation
IBM Research – Brazil
22
FAMA is social sentiment analysis tool for
the Portuguese Language developed by
IBM Research - Brazil
FAMA processes text related to topics of
interest which appear in social media:
Twitter, Facebook, ReclameFacil, etc.; or in
private text repositories such as customer
complaints or call center logs.
FAMA can determine polarity related to the
topics of interest: positive, negative, or
neutral.
FAMA can find most commonly used terms
and their co-occurrences with the topics of
interest. “FAMA”
Greek goddess of gossip and rumor
23. FAMA: Real-Time Social Media Polarity Analysis in
Portuguese
© 2014 IBM Corporation
IBM Research – Brazil
23
Text
Classifier
classified
database
Infosphere
Streams
Stream
Computin
g
learned
database
JSONs
Text
Analytics
dashboard
user
interface
FAMA
24. © 2014 IBM Corporation
IBM Research – Brazil
Construction of the Learned Database
from Manual Analysis of Tweet Samples
24
The data for the learned database is
created by manual inspection of tweets:
about 2000 tweets from 4 friendly matches
15 different coders with different degrees
of interest and knowledge of soccer
uses tool to display, collect, and process
the data.
25. © 2014 IBM Corporation
IBM Research – Brazil
FAMA Analysis of a Tweet: Example of Text Classification
25
vou ser repetitivo de novo , mas : na minha epoca de jovem torcedor da seleção brasileira , brasil
nao tomava gol de p### de chile não viu
vou
ser
repetitivo
de novo
,
mas
:
na
minha
epoca
de
jovem
torcedor
da
seleção
brasileira
brasil
nao
tomava
gol
de
p###
de
chile
não
viu
feature: bad word
verbs: vou, ser, tomava
noums: epoca, brasil, gol, chile, seleção
adjectives: repetitivo, jovem, brasileira, palavrão
vou: ir (to go)
ser: ser (to be)
tomava: tomar (suffer)
p###: palavrão (bad word)
26. © 2014 IBM Corporation
IBM Research – Brazil
26
FAMA (2013): Social Sentiment Analysis with a Naïve Bayes Classifier
Sentiment Analysis
Learning a Classifier
hj vai dar Brazil!, positive
Felipão é mt burrro, negative
O jogo começa as 16h, neutral
function
H
Naive Bayes
Classifier
function H
Supervised
Learning
Algorithm
neymar ta jogando mt hj!!!
positive
neutral
negative
manually annotated corpus
27. © 2014 IBM Corporation
IBM Research – Brazil
Game - Timeline
27
28. © 2014 IBM Corporation
IBM Research – Brazil
28
Confederations Cup Final: Brazil 3x0 Spain
29. © 2014 IBM Corporation
IBM Research – Brazil
Players and Main Topics
29
30. © 2014 IBM Corporation
IBM Research – Brazil
Players and Main Topics
30
Inspired by Social Media Streams (former TwitterVis)
http://arena1.watson.ibm.com:8080/cav/
31. © 2014 IBM Corporation
IBM Research – Brazil
31
32. © 2014 IBM Corporation
IBM Research – Brazil
32
33. © 2014 IBM Corporation
IBM Research – Brazil
33
www.craquedasredes.com.br
A tecnologia de análise de sentimento
social, desenvolvida pela IBM Brasil,
analisa o que está sendo postado nas
redes sociais sobre qualquer tema,
empresa ou pessoa, sem a necessidade de
uma hashtag.
Todos os posts públicos em português são
capturados por um sistema IBM de alta
tecnologia com inteligência artificial, que é
treinado para aprender a interpretar se o
sentimento de cada postagem é positivo,
neutro ou negativo.
Essa tecnologia é capaz de analisar
postagens de diversos assuntos e
naturezas, incluindo gírias, sarcasmo e
linguagem coloquial.
35. © 2014 IBM Corporation
IBM Research – Brazil
35
Limitations of Naive Bayes Approach - Extra Labeling Needed
Naive Bayes
Penalty kick for Uruguay
- David Luiz commited it
- Júlio César defended it
Naive Bayes
Brazil x Uruguay – Semi-final
David Luiz
commited:
- too much
neutral
Julio Cesar
defended:
- too much
neutral
- too much
negative
36. © 2014 IBM Corporation
IBM Research – Brazil
36
Deep Learning Applied to Social Sentiment Analysis
Sentiment Analysis
Multi-Layer
Neural
Network
function N
Learning a Deep Learning Classifier
hj vai dar Brazil!, positive
Felipão é mt burrro, negative
O jogo começa as 16h, neutral
function
N
Deep Learning
Algorithm
neymar ta jogando mt hj!!!
positive
neutral
negative
large scale non-annotated corpus
manually annotated corpus
37. Penalty kick for Uruguay
- David Luiz commits it
- Júlio César defends it
© 2014 IBM Corporation
IBM Research – Brazil
37
Brazil x Uruguay – Improvements with Deep Learning
Naive Bayes Deep CNN
38. Brazil x Uruguay – Improvements with Deep Learning on Players Scores
David Luiz commits penalty Julio Cesar defends penalty
© 2014 IBM Corporation
IBM Research – Brazil
38
Naive Bayes
(FAMA)
Deep CNN
(Deep
FAMA)
39. © 2014 IBM Corporation
IBM Research – Brazil
39
Deep FAMA Covering All 64 Games of World Cup 2014
• all WC’14 64 games
• 53M posts processed
• 34M posts about the games
• peak of 72K/minute
• 5.8M different users
• delivered by team composed by
Research, GBS, GTS, SWG, and
Software Lab BR
• uses full IBM portfolio:
• Infosphere Streams
• Websphere
• DB2
• Cognos BI
• all running on SoftLayer
40. © 2014 IBM Corporation
IBM Research – Brazil
40
Brazil 1x7 Germany: Social Anatomy of the Largest Event in SN History
globally 35.6M tweets (WR)
6.8M posts in Portuguese (19% of world)
peak of 72K/minute (after 5th goal)
1.4M tweets after the game
5th goal peak
of 72K/minute
David Luiz
interview
positive
effects
David Luiz
interview
5th goal
David Luiz saves the image of Brazil after the game: without
David Luiz 271K positive comments about interview, Brazil post-game
positive posts would decrease from 44% to 25%
First half 1.7M: 32% 13% 55% Entire game 4.4M: 33% 13% 54%
44. © 2014 IBM Corporation
IBM Research – Brazil
44
Results Used by TV Globo, ESPN, and TV Band
Globo 2nd screen app
1M downloads, 1.1M page views
ESPN Brazil
28K page views
45. © 2014 IBM Corporation
IBM Research – Brazil
45
Ei! Social Sentiment Solution
46. © 2014 IBM Corporation
IBM Research – Brazil
46 http://bigdatauniversity.com/bdu-wp/bdu-course/big-data-fundamentals/
47. © 2014 IBM Corporation
IBM Research – Brazil
47 https://www.coursera.org/course/mmds
48. www.bluemix.net
Artigos e tutoriais em português:
www.ibm.com/developerworks/br/
© 2014 IBM Corporation
IBM Research – Brazil
48
facebook.com/ibmbluemix
twitter.com/ibmbluemix
IBM Research – Brazil
http://www.research.ibm.com/brazil/
Alan Braz - alanbraz@br.ibm.com - @alanbraz