O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Joseph Orilogbon
Luis Lasierra
Bin Shen
5/12/14Semantic Technologies in IBM Watson 1
Discovering why Topics are Trending o...
5/12/14Semantic Technologies in IBM Watson 2
*
*We set out to explain Why Topics are Trending
on Twitter
*Main approach to...
5/12/14Semantic Technologies in IBM Watson 3
*
*News break on Twitter
*Twitter -> prominent way of expressing
opinions on ...
5/12/14Semantic Technologies in IBM Watson 4
*
*Summarization of trending topics on Twitter
*Categorization of Topics; and...
5/12/14Semantic Technologies in IBM Watson 5
*
http://whytrend.intelworx.com
5/12/14Semantic Technologies in IBM Watson 6
*
*Speech Act Guided Summarization
*Phrase Ranking using MLE
*Phrase Extracti...
5/12/14Semantic Technologies in IBM Watson 7
*
*Speech Acts include : Statement [sta], Question [que],
Comment [com], Sugg...
5/12/14Semantic Technologies in IBM Watson 8
*
*Extracted Phrase were Ranked using the following
equation
* 𝑆𝑆𝑆𝑆𝑆 𝑃 = log
...
5/12/14Semantic Technologies in IBM Watson 9
*
*Extracted N-Grams are only useful if they are:
*Nouns or Noun Phrase
*Verb...
5/12/14Semantic Technologies in IBM Watson 10
*
*This is another round of ranking of phrases based on
how “Salient” they a...
5/12/14Semantic Technologies in IBM Watson 11
*
*Greedy strategy was used to select most salient
phrases
*Phrases were use...
5/12/14Semantic Technologies in IBM Watson 12
*
*The main reference is Zhang et. al, 2013
*Speech Acts were not used for f...
5/12/14Semantic Technologies in IBM Watson 13
*
*Speech Act Training Data Set (Liu, et. al), for
speech act classification...
5/12/14Semantic Technologies in IBM Watson 14
*
*Entity Extraction
*Preprocessing, proper nouns extraction
*Google Knowled...
5/12/14Semantic Technologies in IBM Watson 15
*
*Front end
*Auto-detection/manual selection of location
*Displays trending...
5/12/14Semantic Technologies in IBM Watson 16
*
*Front end: HTML 5, JS, Google Maps API,
Angular JS, JQuery
*Backend: Java...
5/12/14Semantic Technologies in IBM Watson 17
*
*Asked users to provide feedback on results
*Questions covered all 3 parts...
5/12/14Semantic Technologies in IBM Watson 18
Avg = 3.89
Avg = 4.00
5/12/14Semantic Technologies in IBM Watson 19
Avg = 4.21
Avg = 3.84
5/12/14Semantic Technologies in IBM Watson 20
Avg = 4.16
5/12/14Semantic Technologies in IBM Watson 21
*
* Liu, Fei, Yang Liu, and Fuliang Weng. "Why is SXSW trending?: exploring ...
5/12/14Semantic Technologies in IBM Watson 22
*
*Tweets under a topic are loosely grouped together,
sometimes not sharing ...
5/12/14Semantic Technologies in IBM Watson 23
*
*Real-time indexing of tweets before they start
trending, using Lucene/ES ...
5/12/14Semantic Technologies in IBM Watson 24
*
Próximos SlideShares
Carregando em…5
×

Watson presentation

  • Seja o primeiro a comentar

Watson presentation

  1. 1. Joseph Orilogbon Luis Lasierra Bin Shen 5/12/14Semantic Technologies in IBM Watson 1 Discovering why Topics are Trending on Twitter
  2. 2. 5/12/14Semantic Technologies in IBM Watson 2 * *We set out to explain Why Topics are Trending on Twitter *Main approach to achieve this was to use summarization.
  3. 3. 5/12/14Semantic Technologies in IBM Watson 3 * *News break on Twitter *Twitter -> prominent way of expressing opinions on the Internet *Why people are talking about a particular topic in a given location *Commercial interest
  4. 4. 5/12/14Semantic Technologies in IBM Watson 4 * *Summarization of trending topics on Twitter *Categorization of Topics; and *Named-Entity Extraction for Trending topics
  5. 5. 5/12/14Semantic Technologies in IBM Watson 5 * http://whytrend.intelworx.com
  6. 6. 5/12/14Semantic Technologies in IBM Watson 6 * *Speech Act Guided Summarization *Phrase Ranking using MLE *Phrase Extraction using POS filtering *Salience Score of Extracted Phrases *Summary generation using templates
  7. 7. 5/12/14Semantic Technologies in IBM Watson 7 * *Speech Acts include : Statement [sta], Question [que], Comment [com], Suggestion [sug] and Miscellaneous [mis] *Speech Act classification is a multiclass problem *K-Nearest neighbors approach was used for classification.
  8. 8. 5/12/14Semantic Technologies in IBM Watson 8 * *Extracted Phrase were Ranked using the following equation * 𝑆𝑆𝑆𝑆𝑆 𝑃 = log 𝐿(𝑤𝑤𝑤𝑤𝑤 𝑖𝑖 𝑃 𝑎𝑎𝑎 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖) 𝐿(𝑤𝑤𝑤𝑤𝑤 𝑖𝑖 𝑃 𝑎𝑎𝑎 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑) *Dependence/Independence measured based on using a background twitter corpus built from 550,000 tweets *For lengths 1 to L, we extract the top 50 phrases. *L is a model parameter for maximum phrase length
  9. 9. 5/12/14Semantic Technologies in IBM Watson 9 * *Extracted N-Grams are only useful if they are: *Nouns or Noun Phrase *Verbs or a Verb-Centered Phrase *After Extracting N-Grams, those not matching the required patterns were filtered out using RegEx on their POS Tag Pattern *Tagging was done before extracting N-Grams to give the tagger the proper context. *Different patterns are suitable for different Speech-Act
  10. 10. 5/12/14Semantic Technologies in IBM Watson 10 * *This is another round of ranking of phrases based on how “Salient” they are within the given topic *Salience Score is given as 𝑆𝑆 𝑁𝑔 𝑖 = 𝐺𝐺 𝑁𝑔 𝑖 × 𝑁𝑖 * 𝑁𝑖 is the length of N-Gram 𝑁𝑔 𝑖 * 𝐺𝐺 𝑁𝑔 𝑖 is a graph score obtained by iterating over a graph G=(V, E), where V is the set of N-grams, and E is a set edges weighted based on the number of times the N- Grams co-occur.
  11. 11. 5/12/14Semantic Technologies in IBM Watson 11 * *Greedy strategy was used to select most salient phrases *Phrases were used to fill templates *Speech acts used to describe how people are talking about the salient phrases. *Redundant phrases were detected using Jaccard Coefficient of 0.275 *Hashtags were split into words using an existing application.
  12. 12. 5/12/14Semantic Technologies in IBM Watson 12 * *The main reference is Zhang et. al, 2013 *Speech Acts were not used for filtering out tweets *Two rounds of POS filtering was done, as supposed to one in the original paper *Greedy strategy was used as opposed to Round- robin used in the original paper *Representative tweets were also presented to give the user some sense of context.
  13. 13. 5/12/14Semantic Technologies in IBM Watson 13 * *Speech Act Training Data Set (Liu, et. al), for speech act classification *Sentiment 140 dataset, for background corpus *TweetMotif dataset (O’Connor et. al, 2010) for background corpus. *Twitter NLP (Gimpel et al) for POS tagging *Tweets collected via Twitter API for testing summarization model, see examples on site.
  14. 14. 5/12/14Semantic Technologies in IBM Watson 14 * *Entity Extraction *Preprocessing, proper nouns extraction *Google Knowledge Graph: Freebase *Categorization *uClassify API *Extract highest ranking category
  15. 15. 5/12/14Semantic Technologies in IBM Watson 15 * *Front end *Auto-detection/manual selection of location *Displays trending topics *Sends requests to server to analyze topics *Back end *Tweets retrieval *Analysis using model of summarization *Send results to Freebase and uClassify APIs *Caches result
  16. 16. 5/12/14Semantic Technologies in IBM Watson 16 * *Front end: HTML 5, JS, Google Maps API, Angular JS, JQuery *Backend: Java / Play framework and MySQL database *Hosted on AWS
  17. 17. 5/12/14Semantic Technologies in IBM Watson 17 * *Asked users to provide feedback on results *Questions covered all 3 parts of the project *Got 19 responses as at the time of making this slide,
  18. 18. 5/12/14Semantic Technologies in IBM Watson 18 Avg = 3.89 Avg = 4.00
  19. 19. 5/12/14Semantic Technologies in IBM Watson 19 Avg = 4.21 Avg = 3.84
  20. 20. 5/12/14Semantic Technologies in IBM Watson 20 Avg = 4.16
  21. 21. 5/12/14Semantic Technologies in IBM Watson 21 * * Liu, Fei, Yang Liu, and Fuliang Weng. "Why is SXSW trending?: exploring multiple text sources for Twitter topic summarization." 2011. 66--75. * OConnor, Brendan, Michel Krieger, and David Ahn. "TweetMotif: Exploratory Search and Topic Summarization for Twitter." 2010. * Zhang, Renxian, Wenjie Li, Dehong Gao, and You Ouyang. "Automatic Twitter Topic Summarization With Speech Acts." Audio, Speech, and Language Processing, IEEE Transactions on (IEEE) 21 (2013): 649--658. * Nathan Schneider, Brendan O'Connor, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, and Noah A. Smith. Part-of- Speech Tagging for Twitter: Annotation, Features, and Experiments Kevin Gimpel, In Proceedings of ACL 2011. * Abeel, T.; de Peer, Y. V. & Saeys, Y. Java-ML: A Machine Learning Library, Journal of Machine Learning Research, 2009, 10, 931-934
  22. 22. 5/12/14Semantic Technologies in IBM Watson 22 * *Tweets under a topic are loosely grouped together, sometimes not sharing too much in common. *Low performance with Speech-Act Classification *Detection of Main entity *Normalization of tweets could at times result in weird results *Limits on Twitter API 180 search queries/user/application/15 minutes
  23. 23. 5/12/14Semantic Technologies in IBM Watson 23 * *Real-time indexing of tweets before they start trending, using Lucene/ES or other full-text engines. *Detection of sentence overlap in the selected phrases *Detecting redundancies semantically. *Different templates for various topic categories.
  24. 24. 5/12/14Semantic Technologies in IBM Watson 24 *

    Seja o primeiro a comentar

    Entre para ver os comentários

  • moyheen

    May. 13, 2014
  • waleoyediran1

    May. 13, 2014
  • EmmanuelOgungbemi

    Apr. 6, 2015

Vistos

Vistos totais

644

No Slideshare

0

De incorporações

0

Número de incorporações

179

Ações

Baixados

7

Compartilhados

0

Comentários

0

Curtir

3

×