SlideShare uma empresa Scribd logo
1 de 9
Baixar para ler offline
Comparing social tags to microblogs


   Victoria Lai, Christopher Rajashekar, William Rand
              Modeling Social Media 2011
                    October 9, 2011
Social Tags and Social Media
     Brand manager – what are people saying about a product
      online?
     Goal: See if tags about an album
      reflect Twitter conversations
     Amazon tags
       Where purchases take place
       Easier to collect than tweets




2
Similarity framework S(fa(ta),fw(tw)) > θ
                                                          ta
                               album tweets               all tags
album tags (ta)                                           top ten tags
                               keywords (tw)
                                                          fa
            importance                     importance     tag weights
            measure (fa)                   measure (fw)   fw
                                                          frequency
                                                          tf-idf
 phrase 1   #                   phrase 1   #
 phrase 2   #                   phrase 2   #              S
                                                          Spearman
 phrase 3   #         S > θ?    phrase 3   #
                                                          Kendall tau
      …




                                     …
                                                          Precision
                                                          Recall
Baselines (θ)
 General control
   I, the, and, a, of
   Used in tf-idf
 Music control
   music
   Used as threshold
Relevant Work
 Heymann, Ramage, and Garcia-Molina (2008)
  IR measures
 Eck, Lamere, Bertin-Mahieux, and Green (2007)
  correlation measures
 Wagner and Strohmaier (2010)
  tweet stream properties
 Inouye and Kalita (2011)
  automatic tweet summarization
 Wu, Zhang, and Ostendorf (2010)
  tf-idf on user tweets
Correlations
        Threshold (music control)         Base case                   Best case
         C1: ta = all tags, fw =    C2: ta = all tags, fw = C3: ta = top tags, fw =
Album      freq, tw = music                  freq                   tf-idf
         Spearman       Kendall     Spearman      Kendall Spearman          Kendall
 D1         0.44          0.38         0.29           0.25     0.69           0.43
 D2         0.29          0.24         0.38           0.37     0.78           0.70
 D3         0.24          0.20         0.38           0.33     0.33           0.31
 D4         0.30          0.26         0.40           0.35     0.60           0.51
 J1         0.64          0.55         0.31           0.28     0.31           0.28
 J5         0.20          0.18         0.23           0.18     0.63           0.44
 J6         0.47          0.37         0.28           0.19     0.63           0.45
 F2         0.24         0.20         0.43          0.36       0.30           0.28
                       Shaded – strongest correlation listed
                        C3 Bolded – better than base case
Information Retrieval
            Album    Precision     Precision      Recall
                       (P1)      threshold (P2)
       D1           0.48       0.43             0.002
       D2           0.24       0.62             0.008
       D3           0.29      0.36              0.001
       D4           0.36      0.36              0.0004
       J1           0.20      0.50              0.0003
       J3           0.00      0.75              0.00
       J5           0.57      0.40              0.0002
       J6           0.75      0.38              0.0004
       F1           0.00      0.50              0.00
       F2           0.67      0.59              0.00009
       Average      0.35      0.49              0.001
       HV         0.51        0.45              0.0003
       average
       LV average 0.20        0.53              0.002
Conclusions
 Good proxy for top content when sufficient Twitter activity
 More relevant tags are higher in tweet keyword rankings
 TF-IDF is effective


Next Steps
 Larger dataset
 Analysis over time
 Other sources like LastFM
 Linguistic analysis (clustering, stemming)
 Other user-generated data (e.g. user reviews)
Questions?

Mais conteúdo relacionado

Destaque

Cbc icp2 sztulwark-turrillo
Cbc icp2 sztulwark-turrilloCbc icp2 sztulwark-turrillo
Cbc icp2 sztulwark-turrillo
Silvia Saldaña
 
Dead rising 2所有武器合成图
Dead rising 2所有武器合成图Dead rising 2所有武器合成图
Dead rising 2所有武器合成图
graystep209
 
Power San Martí
Power  San  MartíPower  San  Martí
Power San Martí
lacala
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
kunlun
 
Le systéme reproducteur clauderic sirois 3
Le systéme reproducteur   clauderic sirois 3Le systéme reproducteur   clauderic sirois 3
Le systéme reproducteur clauderic sirois 3
clasir0182
 
‘Blame’ ts
‘Blame’ ts‘Blame’ ts
‘Blame’ ts
Emel1234
 

Destaque (15)

Cbc icp2 sztulwark-turrillo
Cbc icp2 sztulwark-turrilloCbc icp2 sztulwark-turrillo
Cbc icp2 sztulwark-turrillo
 
PresentaciónTICS Power Point
PresentaciónTICS Power PointPresentaciónTICS Power Point
PresentaciónTICS Power Point
 
Cv michele piersanti_europass
Cv michele piersanti_europassCv michele piersanti_europass
Cv michele piersanti_europass
 
Gestion de projet
Gestion de projetGestion de projet
Gestion de projet
 
Développement des chaînes de traitement d'images GEOSUD
Développement des chaînes de traitement d'images GEOSUDDéveloppement des chaînes de traitement d'images GEOSUD
Développement des chaînes de traitement d'images GEOSUD
 
Dead rising 2所有武器合成图
Dead rising 2所有武器合成图Dead rising 2所有武器合成图
Dead rising 2所有武器合成图
 
Power San Martí
Power  San  MartíPower  San  Martí
Power San Martí
 
Revette Engineering
Revette EngineeringRevette Engineering
Revette Engineering
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
 
Le systéme reproducteur clauderic sirois 3
Le systéme reproducteur   clauderic sirois 3Le systéme reproducteur   clauderic sirois 3
Le systéme reproducteur clauderic sirois 3
 
Santjordi2014 copy
Santjordi2014 copySantjordi2014 copy
Santjordi2014 copy
 
‘Blame’ ts
‘Blame’ ts‘Blame’ ts
‘Blame’ ts
 
Equity research report Ways2Capital 22 june 2015
Equity research report Ways2Capital 22 june 2015 Equity research report Ways2Capital 22 june 2015
Equity research report Ways2Capital 22 june 2015
 
Nba hoopz manual ntsc dreamcast
Nba hoopz manual ntsc dreamcastNba hoopz manual ntsc dreamcast
Nba hoopz manual ntsc dreamcast
 
後台的朋友
後台的朋友後台的朋友
後台的朋友
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

Comparing social tags to microblogs

  • 1. Comparing social tags to microblogs Victoria Lai, Christopher Rajashekar, William Rand Modeling Social Media 2011 October 9, 2011
  • 2. Social Tags and Social Media  Brand manager – what are people saying about a product online?  Goal: See if tags about an album reflect Twitter conversations  Amazon tags  Where purchases take place  Easier to collect than tweets 2
  • 3. Similarity framework S(fa(ta),fw(tw)) > θ ta album tweets all tags album tags (ta) top ten tags keywords (tw) fa importance importance tag weights measure (fa) measure (fw) fw frequency tf-idf phrase 1 # phrase 1 # phrase 2 # phrase 2 # S Spearman phrase 3 # S > θ? phrase 3 # Kendall tau … … Precision Recall
  • 4. Baselines (θ)  General control  I, the, and, a, of  Used in tf-idf  Music control  music  Used as threshold
  • 5. Relevant Work  Heymann, Ramage, and Garcia-Molina (2008) IR measures  Eck, Lamere, Bertin-Mahieux, and Green (2007) correlation measures  Wagner and Strohmaier (2010) tweet stream properties  Inouye and Kalita (2011) automatic tweet summarization  Wu, Zhang, and Ostendorf (2010) tf-idf on user tweets
  • 6. Correlations Threshold (music control) Base case Best case C1: ta = all tags, fw = C2: ta = all tags, fw = C3: ta = top tags, fw = Album freq, tw = music freq tf-idf Spearman Kendall Spearman Kendall Spearman Kendall D1 0.44 0.38 0.29 0.25 0.69 0.43 D2 0.29 0.24 0.38 0.37 0.78 0.70 D3 0.24 0.20 0.38 0.33 0.33 0.31 D4 0.30 0.26 0.40 0.35 0.60 0.51 J1 0.64 0.55 0.31 0.28 0.31 0.28 J5 0.20 0.18 0.23 0.18 0.63 0.44 J6 0.47 0.37 0.28 0.19 0.63 0.45 F2 0.24 0.20 0.43 0.36 0.30 0.28 Shaded – strongest correlation listed C3 Bolded – better than base case
  • 7. Information Retrieval Album Precision Precision Recall (P1) threshold (P2) D1 0.48 0.43 0.002 D2 0.24 0.62 0.008 D3 0.29 0.36 0.001 D4 0.36 0.36 0.0004 J1 0.20 0.50 0.0003 J3 0.00 0.75 0.00 J5 0.57 0.40 0.0002 J6 0.75 0.38 0.0004 F1 0.00 0.50 0.00 F2 0.67 0.59 0.00009 Average 0.35 0.49 0.001 HV 0.51 0.45 0.0003 average LV average 0.20 0.53 0.002
  • 8. Conclusions  Good proxy for top content when sufficient Twitter activity  More relevant tags are higher in tweet keyword rankings  TF-IDF is effective Next Steps  Larger dataset  Analysis over time  Other sources like LastFM  Linguistic analysis (clustering, stemming)  Other user-generated data (e.g. user reviews)