SlideShare uma empresa Scribd logo
1 de 20
Anticipating Discussion Activity
    on Community Forums
  Matthew Rowe, Sofia Angeletou and Harith Alani
   Knowledge Media Institute, The Open University, Milton
                Keynes, United Kingdom

     The Third IEEE International Conference on Social
           Computing. MIT, Boston, USA. 2011
Community Content
• Online communities are now used to:
   – Ask questions
   – Post opinions and ideas
   – Discuss events and current issues

• Content analysis in online communities is attractive for:
   – Market analysis
   – Brand consensus and product opinion

• Social network analytics in the US is predicted to reach
  $1 billion by 2014 (Forrester 2009)

• Masses of data is now being published in online communities:
    – Facebook has more than 60 million status updates per day
       (Facebook statistics 2010)

Anticipating Discussion Activity on Community Forums             1
Anticipating Discussion Activity on Community Forums   2
The Need for
                                   Analysis
• Analysts need to know which piece of content will generate
  the most activity
   – i.e. the most auspicious or influential
   – Helps focus the attention of human and computerised
     analysts
         • What to track?
• Need to understand the effect features (community and
  content) have on attention to content
• Enable content creators to shape their content in order to
  maximise impact
    – E.g. promoters, government policy makers


RQ1: Which features are key to stimulating discussions?
RQ2: How do these features influence discussion length?


Anticipating Discussion Activity on Community Forums           3
Outline
• Anticipating Discussion Activity: Approach Overview
    – Identifying Seed Posts
    – Predicting Discussion Activity
• Features
• Dataset
    – Community Message Board: Boards.ie
•   1. Identifying Seed Posts
•   2. Predicting Discussion Activity
•   Findings
•   Conclusions




Anticipating Discussion Activity on Community Forums    4
Approach Overview
• Two-stage approach to predict discussion activity in
  online communities:
   1. Identify seed posts
         • i.e. Thread starters that yield a reply
         • Will a given post start a discussion?
         • What are the properties that seed posts exhibit?
              – What parameters tend to trigger a discussion?
    2. Predict discussion activity levels
         • From the identified seed posts
         • What is the level of discussion that a seed post will
           generate?
         • What features correlate with heightened discussion
           activity?


Anticipating Discussion Activity on Community Forums               5
Features
• For each post, model: a) the author, b) the content and
  c) the topical concentration of the author

• F1: User Features
    – In-degree, out-degree: social network properties of the author
    – Post count, age, post rate: participation information of the author
• F2: Content Features
    – Post length, referral count, time in day: surface features of the
      post
    – Complexity: cumulative entropy of terms in the post
    – Readability: Gunning Fog index of the post
    – Informativeness: TF-IDF measure of terms within the post
    – Polarity: average sentiment of terms in the post




Anticipating Discussion Activity on Community Forums                        6
Features (2)
• F3: Focus Features
    – Topic entropy: the concentration of the author across
      community forums
         • Higher entropy indicates a wider spread of forum activity
         • More random distribution, less concentrated
    – Topic Likelihood: the likelihood that a user posts in a
      specific forum given his post history
         • Measures the affinity that a user has with a given forum
         • Lower likelihood indicates a user posting on an unfamiliar
           topic




Anticipating Discussion Activity on Community Forums                    7
Dataset: Boards.ie
• Irish community message board that was established in 1998
• Covers a wide array of topics and themes in forums
    – E.g. World of Warcraft, Japanese Culture, Rugby


• We were provided with the complete dataset spanning 1998-
  2008 of all posts and forum information
    – Focussed on 2006 due to the scale of entire dataset
• No explicit social connections exist in the dataset
    – Social network features were built from the reply-to graph
• 6-month window prior to the post date was used to build the
  user and focus features




Anticipating Discussion Activity on Community Forums               8
1. Identifying Seed
                                   Posts
• Will a given post start a discussion?
• What are the properties that seed posts exhibit?

• Experiment Setup:
    –   Used all thread starter posts from Boards.ie in 2006
    –   Training/validation/testing sets using a 70/20/10% random split
    –   Binary classification task: Is this a seed post or not?
    –   Measures: precision, recall, f-measure, area under ROC curve


• Performed 2 experiments:
    – a) Model Selection
         • Tested individual feature sets (user, content, focus) and combinations
    – b) Feature Assessment
         • Dropping 1 feature at a time, record reduction in f-measure



Anticipating Discussion Activity on Community Forums                                9
1.a) Model Selection




Anticipating Discussion Activity on Community Forums   10
1.b) Feature Assessment




Anticipating Discussion Activity on Community Forums   11
1.b) Feature Assessment




Anticipating Discussion Activity on Community Forums   12
2. Predicting
                                   Discussion Activity
• What is the level of discussion that a seed post will generate?
• What features correlate with heightened discussion activity?

• Experiment Setup:
    – Train: seed posts in 70% training split
    – Test: seed posts in 20% validation split
    – Measure: Normalised Discounted Cumulative Gain (nDCG)
         • Look at varying rank positions: nDCG@k, k=1,2,5,10,20,50,100


• Performed 2 experiments
    – a) Model Selection
         • Regression models: Linear, Isotonic, Support Vector Regression
         • Tested individual feature sets (user, content, focus) and combinations
    – b) Feature Contributions
         • Assess the features in the best performing model from a)


Anticipating Discussion Activity on Community Forums                            13
2.a) Model Selection




Anticipating Discussion Activity on Community Forums   14
2.a) Model Selection




     Linear                            Isotonic        Support Vector Regression




Anticipating Discussion Activity on Community Forums                  15
2.b) Feature
                                   Contributions
• What features correlate with heightened discussion
  activity?




Anticipating Discussion Activity on Community Forums   16
Findings
RQ1:Which features are key to stimulating discussions?
•   Having many URLs in a post can negatively impact discussion activity
     – Could associate the post with spam content
•   Seed posts are associated with greater forum likelihood
•   Lower informativeness is associated with seed posts
     – i.e. seeds use language that is familiar to the community


RQ2: How do these features influence discussion length?
•   Lower forum entropy = heightened discussion activity
•   Greater complexity = heightened discussion activity
     – i.e. include more diverse language in the post
•   Increased activity can be expected from an increase in forum
    likelihood coupled with a decrease in forum entropy
•   Negative sentiment posts generate more activity



Anticipating Discussion Activity on Community Forums                   17
Conclusions and
                                   Future Work
• The two-stage approach is able to:
    – Identify seed posts to a high degree of accuracy
        • F-measure: 0.792
    – Predict discussion activity levels
        • nDCG@1: 0.89 (linear regression model)
        • Content and focus features yield best performing model
              – Average nDCG@k: 0.756


• Findings inform:
    – Market Analysts to track high activity posts from the outset
    – Content creators to shape content in order to maximise impact


• Currently applying approach over different platforms:
    – How can we predict activity on a given social web system?
    – How do social web systems differ in generate activity?

Anticipating Discussion Activity on Community Forums                  18
Questions?
Web: http://people.kmi.open.ac.uk/rowe
Email: m.c.rowe@open.ac.uk
Twitter: @mattroweshow




Anticipating Discussion Activity on Community Forums   19

Mais conteúdo relacionado

Semelhante a Socialcom2011 discussionactivityprediction

Measuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMeasuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMatthew Rowe
 
Social Media Analytics with a pinch of semantics
Social Media Analytics with a pinch of semanticsSocial Media Analytics with a pinch of semantics
Social Media Analytics with a pinch of semanticsThe Open University
 
E-challenges11 WeGov Workshop
E-challenges11 WeGov WorkshopE-challenges11 WeGov Workshop
E-challenges11 WeGov WorkshopWeGov project
 
3: web technologies
3: web technologies3: web technologies
3: web technologiesCOMP 113
 
Content Potluck: Bring Everyone to the Community Table
Content Potluck: Bring Everyone to the Community TableContent Potluck: Bring Everyone to the Community Table
Content Potluck: Bring Everyone to the Community TableNikoletta Vecsei Harrold
 
Predicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebPredicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebMatthew Rowe
 
Conversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignConversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignCommunitySense
 
IBM Connections 3.0.1 & Beyond
IBM Connections 3.0.1 & BeyondIBM Connections 3.0.1 & Beyond
IBM Connections 3.0.1 & BeyondLidia Vikulova
 
Social Media Analysis... according to Net7
Social Media Analysis... according to Net7Social Media Analysis... according to Net7
Social Media Analysis... according to Net7Net7
 
Improving Library Resource Discovery
Improving Library Resource DiscoveryImproving Library Resource Discovery
Improving Library Resource DiscoveryDanya Leebaw
 
Implimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled TechnologyImplimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled TechnologyIndiana Online Users Group
 
Class 5: Project details
Class 5: Project detailsClass 5: Project details
Class 5: Project detailsCOMP 113
 
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Digital Methods Initiative
 
How are project-specific forums utilized? A study of participation, content, ...
How are project-specific forums utilized? A study of participation, content, ...How are project-specific forums utilized? A study of participation, content, ...
How are project-specific forums utilized? A study of participation, content, ...Yusuf Sulistyo Nugroho
 
Global Redirective Practices
Global Redirective PracticesGlobal Redirective Practices
Global Redirective Practicesadjwilli
 

Semelhante a Socialcom2011 discussionactivityprediction (20)

Measuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMeasuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online Communities
 
212 joy moore ssp 2007
212 joy moore ssp 2007212 joy moore ssp 2007
212 joy moore ssp 2007
 
Social Media Analytics with a pinch of semantics
Social Media Analytics with a pinch of semanticsSocial Media Analytics with a pinch of semantics
Social Media Analytics with a pinch of semantics
 
E-challenges11 WeGov Workshop
E-challenges11 WeGov WorkshopE-challenges11 WeGov Workshop
E-challenges11 WeGov Workshop
 
Intro to UOSM2012
Intro to UOSM2012Intro to UOSM2012
Intro to UOSM2012
 
3: web technologies
3: web technologies3: web technologies
3: web technologies
 
Content Potluck: Bring Everyone to the Community Table
Content Potluck: Bring Everyone to the Community TableContent Potluck: Bring Everyone to the Community Table
Content Potluck: Bring Everyone to the Community Table
 
Office365 Communities
Office365 CommunitiesOffice365 Communities
Office365 Communities
 
Predicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebPredicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic Web
 
Conversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignConversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems Design
 
IBM Connections 3.0.1 & Beyond
IBM Connections 3.0.1 & BeyondIBM Connections 3.0.1 & Beyond
IBM Connections 3.0.1 & Beyond
 
Nh Web 2.0 Overview CW V2
Nh Web 2.0 Overview CW V2Nh Web 2.0 Overview CW V2
Nh Web 2.0 Overview CW V2
 
Social Media Analysis... according to Net7
Social Media Analysis... according to Net7Social Media Analysis... according to Net7
Social Media Analysis... according to Net7
 
Improving Library Resource Discovery
Improving Library Resource DiscoveryImproving Library Resource Discovery
Improving Library Resource Discovery
 
sm@jgc Session Three
sm@jgc Session Threesm@jgc Session Three
sm@jgc Session Three
 
Implimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled TechnologyImplimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled Technology
 
Class 5: Project details
Class 5: Project detailsClass 5: Project details
Class 5: Project details
 
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
 
How are project-specific forums utilized? A study of participation, content, ...
How are project-specific forums utilized? A study of participation, content, ...How are project-specific forums utilized? A study of participation, content, ...
How are project-specific forums utilized? A study of participation, content, ...
 
Global Redirective Practices
Global Redirective PracticesGlobal Redirective Practices
Global Redirective Practices
 

Mais de WeGov project

UPDATED USER GUIDE for the Final WeGov Toolbox 3.0: “Was ist WeGov?”(in German)
UPDATED USER GUIDE for the Final WeGov Toolbox 3.0: “Was ist WeGov?”(in German)UPDATED USER GUIDE for the Final WeGov Toolbox 3.0: “Was ist WeGov?”(in German)
UPDATED USER GUIDE for the Final WeGov Toolbox 3.0: “Was ist WeGov?”(in German)WeGov project
 
2nd WeGov Workshop Agenda
2nd WeGov Workshop Agenda2nd WeGov Workshop Agenda
2nd WeGov Workshop AgendaWeGov project
 
WeGov updated generic presentation
WeGov updated generic presentationWeGov updated generic presentation
WeGov updated generic presentationWeGov project
 
Final WeGov Flyer (German edition)
Final WeGov Flyer (German edition)Final WeGov Flyer (German edition)
Final WeGov Flyer (German edition)WeGov project
 
Final WeGov Poster (Updated!)
Final WeGov Poster (Updated!)Final WeGov Poster (Updated!)
Final WeGov Poster (Updated!)WeGov project
 
Final WeGov Flyer (Updated!)
 Final WeGov Flyer (Updated!) Final WeGov Flyer (Updated!)
Final WeGov Flyer (Updated!)WeGov project
 
“What is WeGov” - User Guide for the Phase 2 Evaluation (in English)
“What is WeGov” - User Guide for the Phase 2 Evaluation (in English)“What is WeGov” - User Guide for the Phase 2 Evaluation (in English)
“What is WeGov” - User Guide for the Phase 2 Evaluation (in English)WeGov project
 
WeGov Bundestag Interview Highlights 2011 @Co:llaboratory
WeGov Bundestag Interview Highlights 2011 @Co:llaboratoryWeGov Bundestag Interview Highlights 2011 @Co:llaboratory
WeGov Bundestag Interview Highlights 2011 @Co:llaboratoryWeGov project
 
“Was ist WeGov” - User Guide for the Phase 2 Evaluation (in German)
“Was ist WeGov” - User Guide for the Phase 2 Evaluation (in German)“Was ist WeGov” - User Guide for the Phase 2 Evaluation (in German)
“Was ist WeGov” - User Guide for the Phase 2 Evaluation (in German)WeGov project
 
WeGov Software Präsentation (Prototyp 2.5) im Bundestag
WeGov Software Präsentation (Prototyp 2.5) im BundestagWeGov Software Präsentation (Prototyp 2.5) im Bundestag
WeGov Software Präsentation (Prototyp 2.5) im BundestagWeGov project
 
Public Politician Profiles on Facebook and the Gap of Authenticity
Public Politician Profiles on Facebook and the Gap of AuthenticityPublic Politician Profiles on Facebook and the Gap of Authenticity
Public Politician Profiles on Facebook and the Gap of AuthenticityWeGov project
 
Projektvorstellung WeGov - Deutscher Bundestag
Projektvorstellung WeGov - Deutscher BundestagProjektvorstellung WeGov - Deutscher Bundestag
Projektvorstellung WeGov - Deutscher BundestagWeGov project
 
Digital Monitoring of societal Discussions in online social Networks
Digital Monitoring of societal Discussions in online social NetworksDigital Monitoring of societal Discussions in online social Networks
Digital Monitoring of societal Discussions in online social NetworksWeGov project
 
Iswc2011 role-composition-analysis
Iswc2011 role-composition-analysisIswc2011 role-composition-analysis
Iswc2011 role-composition-analysisWeGov project
 
Eswc2011 socialweb-discussionpredictions
Eswc2011 socialweb-discussionpredictionsEswc2011 socialweb-discussionpredictions
Eswc2011 socialweb-discussionpredictionsWeGov project
 
1st WeGov Workshop Agenda
1st WeGov Workshop Agenda1st WeGov Workshop Agenda
1st WeGov Workshop AgendaWeGov project
 
1st WeGov Workshop description
1st WeGov Workshop description1st WeGov Workshop description
1st WeGov Workshop descriptionWeGov project
 
1st WeGov Workshop within eChallenges e-2011, in October 2011
1st WeGov Workshop within eChallenges e-2011, in October 20111st WeGov Workshop within eChallenges e-2011, in October 2011
1st WeGov Workshop within eChallenges e-2011, in October 2011WeGov project
 

Mais de WeGov project (20)

UPDATED USER GUIDE for the Final WeGov Toolbox 3.0: “Was ist WeGov?”(in German)
UPDATED USER GUIDE for the Final WeGov Toolbox 3.0: “Was ist WeGov?”(in German)UPDATED USER GUIDE for the Final WeGov Toolbox 3.0: “Was ist WeGov?”(in German)
UPDATED USER GUIDE for the Final WeGov Toolbox 3.0: “Was ist WeGov?”(in German)
 
2nd WeGov Workshop Agenda
2nd WeGov Workshop Agenda2nd WeGov Workshop Agenda
2nd WeGov Workshop Agenda
 
WeGov updated generic presentation
WeGov updated generic presentationWeGov updated generic presentation
WeGov updated generic presentation
 
Final WeGov Flyer (German edition)
Final WeGov Flyer (German edition)Final WeGov Flyer (German edition)
Final WeGov Flyer (German edition)
 
WeGov Brochure (New!)
WeGov Brochure (New!)WeGov Brochure (New!)
WeGov Brochure (New!)
 
Final WeGov Poster (Updated!)
Final WeGov Poster (Updated!)Final WeGov Poster (Updated!)
Final WeGov Poster (Updated!)
 
Final WeGov Flyer (Updated!)
 Final WeGov Flyer (Updated!) Final WeGov Flyer (Updated!)
Final WeGov Flyer (Updated!)
 
“What is WeGov” - User Guide for the Phase 2 Evaluation (in English)
“What is WeGov” - User Guide for the Phase 2 Evaluation (in English)“What is WeGov” - User Guide for the Phase 2 Evaluation (in English)
“What is WeGov” - User Guide for the Phase 2 Evaluation (in English)
 
WeGov Bundestag Interview Highlights 2011 @Co:llaboratory
WeGov Bundestag Interview Highlights 2011 @Co:llaboratoryWeGov Bundestag Interview Highlights 2011 @Co:llaboratory
WeGov Bundestag Interview Highlights 2011 @Co:llaboratory
 
“Was ist WeGov” - User Guide for the Phase 2 Evaluation (in German)
“Was ist WeGov” - User Guide for the Phase 2 Evaluation (in German)“Was ist WeGov” - User Guide for the Phase 2 Evaluation (in German)
“Was ist WeGov” - User Guide for the Phase 2 Evaluation (in German)
 
WeGov Software Präsentation (Prototyp 2.5) im Bundestag
WeGov Software Präsentation (Prototyp 2.5) im BundestagWeGov Software Präsentation (Prototyp 2.5) im Bundestag
WeGov Software Präsentation (Prototyp 2.5) im Bundestag
 
Public Politician Profiles on Facebook and the Gap of Authenticity
Public Politician Profiles on Facebook and the Gap of AuthenticityPublic Politician Profiles on Facebook and the Gap of Authenticity
Public Politician Profiles on Facebook and the Gap of Authenticity
 
Projektvorstellung WeGov - Deutscher Bundestag
Projektvorstellung WeGov - Deutscher BundestagProjektvorstellung WeGov - Deutscher Bundestag
Projektvorstellung WeGov - Deutscher Bundestag
 
Digital Monitoring of societal Discussions in online social Networks
Digital Monitoring of societal Discussions in online social NetworksDigital Monitoring of societal Discussions in online social Networks
Digital Monitoring of societal Discussions in online social Networks
 
Iswc2011 role-composition-analysis
Iswc2011 role-composition-analysisIswc2011 role-composition-analysis
Iswc2011 role-composition-analysis
 
Eswc2011 socialweb-discussionpredictions
Eswc2011 socialweb-discussionpredictionsEswc2011 socialweb-discussionpredictions
Eswc2011 socialweb-discussionpredictions
 
1st WeGov Workshop Agenda
1st WeGov Workshop Agenda1st WeGov Workshop Agenda
1st WeGov Workshop Agenda
 
1st WeGov Workshop description
1st WeGov Workshop description1st WeGov Workshop description
1st WeGov Workshop description
 
1st WeGov Workshop within eChallenges e-2011, in October 2011
1st WeGov Workshop within eChallenges e-2011, in October 20111st WeGov Workshop within eChallenges e-2011, in October 2011
1st WeGov Workshop within eChallenges e-2011, in October 2011
 
WeGov Field Trials
WeGov Field TrialsWeGov Field Trials
WeGov Field Trials
 

Último

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 

Socialcom2011 discussionactivityprediction

  • 1. Anticipating Discussion Activity on Community Forums Matthew Rowe, Sofia Angeletou and Harith Alani Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom The Third IEEE International Conference on Social Computing. MIT, Boston, USA. 2011
  • 2. Community Content • Online communities are now used to: – Ask questions – Post opinions and ideas – Discuss events and current issues • Content analysis in online communities is attractive for: – Market analysis – Brand consensus and product opinion • Social network analytics in the US is predicted to reach $1 billion by 2014 (Forrester 2009) • Masses of data is now being published in online communities: – Facebook has more than 60 million status updates per day (Facebook statistics 2010) Anticipating Discussion Activity on Community Forums 1
  • 3. Anticipating Discussion Activity on Community Forums 2
  • 4. The Need for Analysis • Analysts need to know which piece of content will generate the most activity – i.e. the most auspicious or influential – Helps focus the attention of human and computerised analysts • What to track? • Need to understand the effect features (community and content) have on attention to content • Enable content creators to shape their content in order to maximise impact – E.g. promoters, government policy makers RQ1: Which features are key to stimulating discussions? RQ2: How do these features influence discussion length? Anticipating Discussion Activity on Community Forums 3
  • 5. Outline • Anticipating Discussion Activity: Approach Overview – Identifying Seed Posts – Predicting Discussion Activity • Features • Dataset – Community Message Board: Boards.ie • 1. Identifying Seed Posts • 2. Predicting Discussion Activity • Findings • Conclusions Anticipating Discussion Activity on Community Forums 4
  • 6. Approach Overview • Two-stage approach to predict discussion activity in online communities: 1. Identify seed posts • i.e. Thread starters that yield a reply • Will a given post start a discussion? • What are the properties that seed posts exhibit? – What parameters tend to trigger a discussion? 2. Predict discussion activity levels • From the identified seed posts • What is the level of discussion that a seed post will generate? • What features correlate with heightened discussion activity? Anticipating Discussion Activity on Community Forums 5
  • 7. Features • For each post, model: a) the author, b) the content and c) the topical concentration of the author • F1: User Features – In-degree, out-degree: social network properties of the author – Post count, age, post rate: participation information of the author • F2: Content Features – Post length, referral count, time in day: surface features of the post – Complexity: cumulative entropy of terms in the post – Readability: Gunning Fog index of the post – Informativeness: TF-IDF measure of terms within the post – Polarity: average sentiment of terms in the post Anticipating Discussion Activity on Community Forums 6
  • 8. Features (2) • F3: Focus Features – Topic entropy: the concentration of the author across community forums • Higher entropy indicates a wider spread of forum activity • More random distribution, less concentrated – Topic Likelihood: the likelihood that a user posts in a specific forum given his post history • Measures the affinity that a user has with a given forum • Lower likelihood indicates a user posting on an unfamiliar topic Anticipating Discussion Activity on Community Forums 7
  • 9. Dataset: Boards.ie • Irish community message board that was established in 1998 • Covers a wide array of topics and themes in forums – E.g. World of Warcraft, Japanese Culture, Rugby • We were provided with the complete dataset spanning 1998- 2008 of all posts and forum information – Focussed on 2006 due to the scale of entire dataset • No explicit social connections exist in the dataset – Social network features were built from the reply-to graph • 6-month window prior to the post date was used to build the user and focus features Anticipating Discussion Activity on Community Forums 8
  • 10. 1. Identifying Seed Posts • Will a given post start a discussion? • What are the properties that seed posts exhibit? • Experiment Setup: – Used all thread starter posts from Boards.ie in 2006 – Training/validation/testing sets using a 70/20/10% random split – Binary classification task: Is this a seed post or not? – Measures: precision, recall, f-measure, area under ROC curve • Performed 2 experiments: – a) Model Selection • Tested individual feature sets (user, content, focus) and combinations – b) Feature Assessment • Dropping 1 feature at a time, record reduction in f-measure Anticipating Discussion Activity on Community Forums 9
  • 11. 1.a) Model Selection Anticipating Discussion Activity on Community Forums 10
  • 12. 1.b) Feature Assessment Anticipating Discussion Activity on Community Forums 11
  • 13. 1.b) Feature Assessment Anticipating Discussion Activity on Community Forums 12
  • 14. 2. Predicting Discussion Activity • What is the level of discussion that a seed post will generate? • What features correlate with heightened discussion activity? • Experiment Setup: – Train: seed posts in 70% training split – Test: seed posts in 20% validation split – Measure: Normalised Discounted Cumulative Gain (nDCG) • Look at varying rank positions: nDCG@k, k=1,2,5,10,20,50,100 • Performed 2 experiments – a) Model Selection • Regression models: Linear, Isotonic, Support Vector Regression • Tested individual feature sets (user, content, focus) and combinations – b) Feature Contributions • Assess the features in the best performing model from a) Anticipating Discussion Activity on Community Forums 13
  • 15. 2.a) Model Selection Anticipating Discussion Activity on Community Forums 14
  • 16. 2.a) Model Selection Linear Isotonic Support Vector Regression Anticipating Discussion Activity on Community Forums 15
  • 17. 2.b) Feature Contributions • What features correlate with heightened discussion activity? Anticipating Discussion Activity on Community Forums 16
  • 18. Findings RQ1:Which features are key to stimulating discussions? • Having many URLs in a post can negatively impact discussion activity – Could associate the post with spam content • Seed posts are associated with greater forum likelihood • Lower informativeness is associated with seed posts – i.e. seeds use language that is familiar to the community RQ2: How do these features influence discussion length? • Lower forum entropy = heightened discussion activity • Greater complexity = heightened discussion activity – i.e. include more diverse language in the post • Increased activity can be expected from an increase in forum likelihood coupled with a decrease in forum entropy • Negative sentiment posts generate more activity Anticipating Discussion Activity on Community Forums 17
  • 19. Conclusions and Future Work • The two-stage approach is able to: – Identify seed posts to a high degree of accuracy • F-measure: 0.792 – Predict discussion activity levels • nDCG@1: 0.89 (linear regression model) • Content and focus features yield best performing model – Average nDCG@k: 0.756 • Findings inform: – Market Analysts to track high activity posts from the outset – Content creators to shape content in order to maximise impact • Currently applying approach over different platforms: – How can we predict activity on a given social web system? – How do social web systems differ in generate activity? Anticipating Discussion Activity on Community Forums 18
  • 20. Questions? Web: http://people.kmi.open.ac.uk/rowe Email: m.c.rowe@open.ac.uk Twitter: @mattroweshow Anticipating Discussion Activity on Community Forums 19

Notas do Editor

  1. 80% to 20% skew towards seeds from non-seeds
  2. Content features outperform user featuresContent and focus outperforms other feature combinationsAll feature together works bestDiffers from Twitter analysis – user features were better predictors than content features
  3. Trained J48 with all features using the training splitTested it on the held-out 10%Dropped1 feature at a time from the model and classified the test splitLooking for features that have greatest reduction in accuracy
  4. Boxplots show:Higher referral counts correlate with non-seedsSpamHigher forum likelihood correlates with seedsUsers who concentrate their discussions within select forums will start a discussion – as they’re known to the communityHigher informativeness correlated with non-seeds
  5. Solitary features:User features perform best as the solitary feature sets for Linear regression and SVRFocus features best for Isotonic regressionCombinedContent and focus perform best for Linear Isotonic
  6. Smallest SD for content and focus features
  7. A user can expect increased discussion activity if he/she hasLow forum entropyHigh forum likelihoodIs negative in his/her posts Uses complex language (wide vocab – i.e. articulate)