O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Carregando em…3
1 de 23

Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter



Baixar para ler offline

H. Purohit, Y. Ruan, A. Joshi, S. Parthasarathy, A. Sheth. Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter. in SoME 2011 (Workshop on Social Media Engagement, in conjunction with WWW 2011), March 29, 2011.

Paper: http://knoesis.org/library/resource.php?id=1095

More on Social Media @ Kno.e.sis at http://knoesis.org/research/semweb/projects/socialmedia/

Mais Conteúdo rRelacionado

Você Pode Gostar Também

Audiolivros relacionados

Gratuito durante 30 dias do Scribd

Ver tudo

Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

  1. 1. Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter<br />March 29, 2011<br />SoME 2011 (In Conjunction with WWW 2011)<br />Hemant Purohit1, Yiye Ruan2, Amruta Joshi2,<br />Srinivasan Parthasarthy2, Amit Sheth1<br />1Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State University, USA<br />2Dept. of Computer Science & Engineering<br />Ohio State University, USA<br />
  2. 2. Outline<br />(What) User-Community Engagement<br />(Why) Motivation<br />(How) Problem Formalization<br />Approach<br />Terminology<br />Definition<br />Analysis Framework<br />People-Content-Network Analysis (PCNA)<br />Experiments <br />Datasets and Event Categorization<br />Features<br />Results<br />Insights<br />Conclusion & Future work<br />2<br />
  3. 3. User-communityEngagement<br />Multiple topics surrounding events being discussed on social media<br />Each topic constitutes a community of users discussing about it<br />e.g., Japan Earthquake community<br />How do we understand the phenomenon of user participation (engagement) in topic discussions<br />3<br />Image: http://itcilo.wordpress.com<br />
  4. 4. Motivation<br />User Engagement Analysis<br />Business<br />How communities form during the product launch?<br />What factors can attract users to engage in these communities, therefore further spreading the message?<br />Crisis Management<br />Effective communication: How quickly we can disseminate information between resource providers and people in need of resources?<br />4<br />
  5. 5. Problem formalization: Approach<br />User engagement has been studied in many forms:<br /> Community Formation & Detection, Information Propagation, Link Prediction etc.<br />It involves a three-dimensional dynamic at play: <br />Content: topic of interest, <br />People: participants who engages in discussion about the topic, and<br />Network: community structure formed around the topic discussion<br />Rather than limiting to one dimension, we propose multidimensional approach<br />Case Study on Twitter<br />5<br />
  6. 6. Earlier Approaches<br />6<br />Content: Topic of Interest<br />OR<br />People: Participant of the discussion<br />OR<br />Network: Community around topic<br />Images: tupper-lake.com/.../uploads/Community.jpg<br /> http://www.iconarchive.com/show/people-icons-by-aha-soft/user-icon.html<br />
  7. 7. Our Approach<br />7<br />Content: Topic of Interest<br />AND<br />People: Participant of the discussion<br />AND<br />Network: Community around topic<br />Images: tupper-lake.com/.../uploads/Community.jpg<br /> http://www.iconarchive.com/show/people-icons-by-aha-soft/user-icon.html<br />
  8. 8. Problem formalization: Terminology<br />Event-Oriented Community<br />An implicit group of social network users who have joined discussion (by message posting) on topic about an event.<br />Slice<br />Collection of messages relevant to topic of discussion, posted during a fixed-length time window.<br />Snapshot<br />State of the network at a certain point of time at which user profile and connection information are crawled.<br />Active Window: freshness matters!<br />Active Community<br />8<br />
  9. 9. Problem formalization: Definition<br />Binary classification problem for user-community link prediction.<br />User Engagement Prediction Problem: Given <br />1) an event-oriented community C formed around a topic of discussion; <br />2) a Twitter user U ε C, <br />Predict whether U will be engaged in C (by composing a new tweet or retweeting an existing tweet which contains keywords or hashtag related to C's underlying event) in a future slice. If so, U is said to be a positive record. Otherwise, it is a negative record.<br />9<br />
  10. 10. Analysis Framework:People-Content-network Analysis (PCNA)<br />10<br />
  11. 11. Experiments: Dataset & Event categorization<br />Study on Twitter data<br />Events have various characteristics and we hypothesize the user engagement analysis for them being affected by different variables<br />No standard event categorization is available, so we categorize the events observing data over a time, as follows:<br />Global (G) vs. Local (L)[e.g., Japan Earthquake vs. Iowa State Fair]<br />Deterministic (D) vs. Unexpected (U)[e.g., Emmy Awards vs. Japan Earthquake]<br />Compact (C) vs. Loose (Ls)[e.g., ISWC conference vs. Japan Earthquake]<br />Transient (T) vs. Lasting (Lt)[e.g., President’s Speech vs. Egypt Revolution]<br />11<br />
  12. 12. ClevelandShowPremiere: Second Season premiere of animated TV series Cleveland Show. September 26. Global, loose, deterministic, transient.<br />DiscoveryBuildingCrisis: Hostage crisis at the head- quarters of Discovery Channel, Maryland. September 1. Local, loose, unexpected, transient.<br />EmmyAwards: 62nd Prime-time Emmy Awards. August 29. Global, loose, deterministic, lasting.<br />GoogleInstantSearch: Launch of Google Instant in United States. September 8. Global, loose, unexpected, transient.<br />HeismanTrophy: Reggie Bush’s announcement to forfeit 2005 Heisman Trophy. September 14. Local, compact, unexpected, lasting.<br />IowaStateFair: Iowa State Fair. August 12-22. Local, loose, deterministic, lasting.<br />JewishNewYear: Jewish New Year 5771. September 8-10. Global, compact, deterministic, transient.<br />LindsayLohanHearing: LindsayLohan’s hearing on probation revocation and verdict. September 24. Local, loose, deterministic, transient.<br />LinuxCon: Annual convention organized by Linux Foundation. August 10-12. Global, compact, deterministic, lasting.<br />LondonTubeStrike: London tube strike. September 6. Local, loose, deterministic, transient.<br />RichCroninDeath: Death of singer and songwriter Rich Cronin. September 8. Local, loose, unexpected, transient.<br />ScottPilgrimRelease: Release of movie Scott Pilgrim vs. the World. Aug 13. Global, loose, deterministic, lasting.<br />SESSanFrancisco: Search Engine Strategies 2010 at San Francisco. August 16-20. Global, compact, deterministic, lasting.<br />StuxnetWorm: Confirmation of Stuxnet worm at- tack on Iranian nuclear program. September 24. Global, loose, unexpected, lasting.<br />12<br />Events (labeled as per our categorization)<br />
  13. 13. Experiments: Features<br />Organized in the PCNA framework: Node/Author features (P), Content features (C), Community features (N)<br />Extracted for each potential community member (U) in each slice, where U belongs to the union of follower lists of each active community member<br />13<br />Followee of (U)<br />Whole Topic Community<br />Potential new Member (U)<br />EDGE:<br />Active Community<br />A<br />B<br />If B follows A<br />
  14. 14. Experiments: Features (cont.)<br /><ul><li>Community features: [Characteristics of the active community/network under consideration]</li></ul>wccSize: size of the weakly-connected component (WCC) which U’s friends belongs to in the active network.<br />wccPercent: ratio of wccSizeto the size of the active network.<br />connectivity: number of active friends (i.e. followees) in the community.<br />communitySize: size of the active community.<br />Author features [Characteristics of friends that U is following]:<br />Only friends in the active community are considered.<br />logFollower: logarithm of follower count<br />logFollowee: logarithm of followee count<br />Klout[1]: a integrated measure of user influence and popularity<br />Other profile information and activity history[2].<br />[1] http://www.klout.com<br />[2] Future works<br />14<br />
  15. 15. Experiments: Features (cont.)<br />Content features[Characteristics of tweets posted by active friends of U]:<br />keywords: number of event-relevant keywords<br />hashtags: number of event-relevant hashtags<br />retweet: number of retweets<br />mention: number of mentions<br />url: number of relevancy-adjust hyperlinks<br />Irrelevant hyperlink is given number -1<br />subjectivity: Subjectivity scores for words and emoticons<br />Linguistic Cues (LIWC1 analysis): Features for the language usage. Top-3 transformed features using Principle Component Analysis (PCA) extracted<br />15<br />1http://www.liwc.net<br />
  16. 16. Wait a minute! <br />Not all contents have been viewed!<br />Novelty and Attention: User is likely to see new or recent content/tweet and then join the community<br />Apply temporal weighting on the features <br />Dataset imbalance: too many negative records!<br />Alleviated by SMOTE method<br />Over-sampling on positive records and under-sampling on negative ones<br />Not all users are active!<br />Apply weighting on activity level based on last activity[1]<br />16<br />[1] Future works<br />
  17. 17. Experiments<br />We run the following experiment groups:<br />allFeatures (All): contains all three feature groups<br />onlyContent (Con.): contains only content feature<br />onlyAuthor (Aut.): contains only author feature<br />onlyCommunity (Com.): contains only community feature<br />SVM classifier<br />LibSVM, RBF Kernel, gamma=8, c=32<br />17<br />
  18. 18. Experiments: Results<br />18<br />Event-Type<br />Summary of Prediction Accuracy (%)<br />Statistical significant results are in bold<br />
  19. 19. Insights<br />Performance of onlyCommunity classifiers is worst<br />The latent nature of network features makes it difficult to be perceived by a user directly.<br />The onlyContent classifiers give the best performance over other single feature groups<br />Some users end up participating in a discussion based on observing the information from the public timeline, and therefore, these ad-hoc users are hard to observe via network analysis only.<br />Content is engaging by its quality and nature (information sharing or call for an action or crowd sourcing). For example, link to an image or video (an evidential content) about Reggie Bush's surrender of Heisman Trophy in September, 2010 is likely to provoke lot more thoughts in a user's mind to engage in the discussion.<br />19<br />
  20. 20. Insights (Cont.)<br />Comparable performance of onlyAuthor classifiers as onlyContent classifiers for some of the topics<br />Impact of the effective presence of influential people in the discussion group<br />Insufficiency in content features, reflected by low average connectivity, can be compensated by author features (e.g., Rich Cronin Death).<br />Statistical significance testing method shows allFeatures classifiers have better or equivalent performance over any single feature group classifier for 12 out of 14 topics<br />The advantage of using all features is dominant, where degree of randomness in individual dimensions can be really high (e.g., Discovery Building attack).<br />20<br />
  21. 21. Insights (Cont.)<br />No significant correlation between selection of feature groups and the event types: lasting vs. transient. <br />Possibility of the shift in the characteristics over time<br />Advantage of allFeatures over other factor groups is generally stronger on the unexpected topics than the deterministic ones.<br />Degree of randomness being high in discussions surrounding unexpected events<br />21<br />
  22. 22. Conclusion & Future Work<br />Every dimension (People, Content, Network) cannot be expected to perform well in all types of topic discussions, and hence, a strong need can be felt to study dynamics of user engagement by using the PCNA framework.<br />Experiments with a more refined event types taxonomy and user engagement factors, with consideration of shift in the event characteristics over time<br />Semantic Analysis of content to enhance content features<br />Experiment on other social networks: Forums, DBLP<br />22<br />
  23. 23. Questions?<br />Paper at: http://knoesis.org/library/resource.php?id=1095<br />More on Social Media @ Kno.e.sis at http://knoesis.org/research/semweb/projects/socialmedia/<br />23<br />


  • Active window: due to concept of ‘novelty and attention’ in today’s social media systems’ design ---- (reason is information overload for a user to keep up with!!)
  • Observe the data of active window Find the followers base of the active community members as the samples for the prediction problem Analyze features for them, and classify, whether the follower U is going to join the community C
  • Global (G) vs. Local (L) – Scale (how many people will it attract)[e.g., Japan Earthquake vs. Iowa State Fair]Deterministic (D) vs. Unexpected (U) -- (expected)[e.g., Emmy Awards vs. Japan Earthquake]Compact (C) vs. Loose (Ls) -- (If people already know each other, so they are connected tightly)[e.g., ISWC conference vs. Japan Earthquake]Transient (T) vs. Lasting (Lt) -- (Event lasts for just a small time vs. long time)[e.g., President’s Speech vs. Egypt Revolution]
  • Essentially 3 major types for features:For Community surrounding the event size growth rateFor usersUsers that you follow, mention or retweet fromUsers that share a large overlap of friends with youUsers that share similar profilesFor tweet content[only from ‘followees’ of new users joining the community]url, mentions, RT, # Linguistics analysis features
  • Using style of writing for authorhashtags usage affects -- attention
  • ×