SlideShare uma empresa Scribd logo
1 de 58
Baixar para ler offline
Probabilistic Graphical Models for
Credibility Analysis in Evolving
Online Communities
Subhabrata Mukherjee
Max Planck Institute for Informatics, Germany
smukherjee@mpi-inf.mpg.de
Outline
● Motivation
● Prior Work and its Limitations
● Credibility Analysis
↘ Framework for Online
Communities
↘ Temporal Evolution of Online
Communities
↘ Credibility Analysis of
Product Reviews
● Conclusions
2
Online Communities as
a Knowledge Resource
● Online communities are massive
repositories of knowledge accessed by
regular users and professionals
↘ 59% of adult U.S. population and half
of U.S. physicians rely on online
resources [IMS Health Report, 2014]
↘ 40% of online consumers consult
online reviews before buying products
[Nielson Corporation, 2016]
● However their usability is restricted due to
serious credibility concerns (e.g., spams,
misinformation, bias etc.)
Concerns
4
Misinformation for health can
have hazardous consequences
“Rapid spread of misinformation online” --- one of
top 10 challenges as per The World Economic Forum
Outline
● Motivation
● Prior Work and its Limitations
● Credibility Analysis
↘ Framework for Online
Communities
↘ Temporal Evolution of Online
Communities
↘ Credibility Analysis of
Product Reviews
● Conclusions
5
Truth Finding
Structured data (e.g., SPO
triples, tables, networks)
Objective facts (e.g.,
Obama_BornIn_Hawaii vs.
Obama_BornIn_Kenya)
No contextual data (text)
No external KB, metadata
6
Linguistic Analysis
Unstructured text
Subjective information (e.g.,
opinion spam, bias, viewpoint )
External KB (e.g., WordNet,
KG)
No network / interactions,
metadata
1. How can we jointly leverage users,
network, and context for credibility
analysis in online communities?
2. How can we model users’ evolution?
3. How can we deal with limited data?
4. How can we generate interpretable
explanations for credibility verdict?
7
Research Questions
Contributions
● Credibility Analysis Framework for Online Communities
↘ Classification: Health Communities [SIGKDD 2014]
↘ Regression: News Communities [CIKM 2015]
● Temporal Evolution of Online Communities
[ICDM 2015, SIGKDD 2016]
● Credibility Analysis of Product Reviews
[ECML-PKDD 2016, SDM 2017]
8
Outline
● Motivation
● Prior Work and its Limitations
● Credibility Analysis
↘ Framework for Online
Communities
↘ Temporal Evolution of Online
Communities
↘ Credibility Analysis of
Product Reviews
● Conclusions
9
What is Credibility?
10
“A statement is credible if it is reported
by a trustworthy user in an objective
language”
“Trustworthy users corroborate each
other on credible statements”
Credibility Analysis Framework for Classification
Problem: Given a set of posts from different users, extract credible statements
(subject-predicate-object triples like DrugX_HasSideEffect_Y) from trustworthy users
11Subhabrata Mukherjee, Gerhard Weikum and Cristian Danescu-Niculescu-Mizil: SIGKDD 2014
Credibility Analysis Framework for Classification
Subhabrata Mukherjee, Gerhard Weikum and Cristian Danescu-Niculescu-Mizil: SIGKDD 2014 12
Problem: Given a set of posts from different users, extract credible statements
(subject-predicate-object triples like DrugX_HasSideEffect_Y) from trustworthy users
Network of Interactions: Cliques
Statements: An IE tool
generates candidate
triple patterns like:
Xanax_causes_headache,
Xanax_gave_demonic-feel
Potentially thousands of such triples,
with only a handful of credible ones
➔ Each user, post, and statement is a random variable with edges depicting interactions.
Variables have observable features (e.g, authority, emotionality).
➔ A clique is formed between each user writing a post containing a statement.
13
Network of Interactions: Cliques
Idea: Trustworthy users corroborate on credible statements in objective language
Statements: An IE tool
generates candidate
triple patterns like:
Xanax_causes_headache,
Xanax_gave_demonic-feel
Potentially thousands of such triples,
with only a handful of credible ones
Each user, post, and statement is a random variable with edges depicting interactions
14
15
Conditional Random Field to Exploit Joint Interactions
(Users + Network + Context)
Partial Supervision: Expert stated (top 20%) side-effects of drugs as partial training labels.
Model predicts labels of unobserved statements.
How to complement expert
medical knowledge with
large scale non-expert data?
Semi-Supervised Conditional Random Field
1. Estimate user trustworthiness:
2. Estimate label of unknown statements Su
by Gibbs Sampling:
3. Maximize log-likelihood to estimate feature weights:
4. Apply E-Step and M-Step till convergence
16
Healthforum Dataset
● Healthboards.com community (www.healthboards.com) with 850,000
registered users and 4.5 million posts
17
● Expert labels about drugs from MayoClinic (www.mayoclinic.org)
↘ 6 widely used drugs for experimentation
18
19
What constitutes credible language?
Affective Emotions
confidence
sympathy
self-esteem
eagerness
coolness
compunction
anxiety
embarrassment
misery
distress
20
What constitutes credible language?
determiner (this, that,..)
negation (not, never, ..)
second person (you, ..)
conjunction (therefore,
consequently, ..)
contrast (despite, though, ..)
question (what, why, ..)
conditional (if)
adverb (maybe, probably, ..)
modality (might, could, ..)
Discourse and Modalities
21
Credibility Analysis Framework
for Regression
In many online communities users rate items on their quality
Credibility Analysis in News Communities
Topics
Climate Change
Sources
trunews.com
Articles
“Global warming is a
hoax”
Sources / Users
Scientificamerican.com
snopes.com
user-donald
Reviews & Ratings
scientific analysis, 1.5/ 5,
conspiratory theory
22
However, user feedback is often subjective; influenced by their bias and viewpoints
Reviews / Ratings
scientific analysis, 1.5/ 5,
conspiratory theory
Topics
Climate Change
Articles
“Global warming is a
hoax”
Sources
trunews.com
Sources / Users
Scientificamerican.com
snopes.com
user-donald
Idea: Trustworthy sources publish objective articles corroborated by expert
users with credible reviews/ratings 23
We use CRF to capture these mutual interactions in
news communities (e.g., newstrust.net, digg, reddit)
to jointly rank all of the underlying factors.
Credibility Analysis Framework for Regression
Online Communities: Factors
Related to Ensemble Learning, Learning to Rank
How to incorporate continuous ratings instead of discrete labels in CRF ?
25Subhabrata Mukherjee and Gerhard Weikum: CIKM 2015
Probability Mass Function for discrete labels:
Probability Density Function for continuous ratings:
Energy Function to Combine All
How to incorporate continuous ratings instead of discrete labels in CRF ?
27
● We show that a certain energy function for clique potential --- geared for
reducing mean-squared-error --- results in multivariate gaussian p.d.f. !!!
● Constrained Gradient Ascent for inference
Subhabrata Mukherjee and Gerhard Weikum: CIKM 2015
Predicting Article Credibility Ratings in Newstrust.net
28
Progressive decrease in mean squared error with
more network interactions, and context
Take-away
● Semi-supervised and Continuous CRF to jointly identify trustworthy users,
credible statements, and reliable postings in online communities
● A framework to incorporate richer aspects like user expertise, topics /
facets, temporal evolution etc.
29
Outline
● Motivation
● Prior Work and its Limitations
● Credibility Analysis
↘ Framework for Online
Communities
↘ Temporal Evolution of Online
Communities
↘ Credibility Analysis of
Product Reviews
● Conclusions
30
● Online communities are dynamic, as users join and leave; acquire new
vocabulary; evolve and mature over time
● Trustworthiness and expertise of users evolve over time
Temporal Evolution
31
How to capture evolving user expertise?
“My first DSLR. Excellent camera, takes great pictures with high
definition, without a doubt it makes honor to its name.”
[Aug, 1997]
“The EF 75-300 mm lens is only good to be used outside. The 2.2X
HD lens can only be used for specific items; filters are useless if
ISO, AP,... . The short 18-55mm lens is cheap and should have a
hood to keep light off lens.”
[Oct, 2012]
Illustrative Example for Review Communities
32
● Consider following camera reviews by the same user John:
Mukherjee et al.: ICDM 2015, SIGKDD 2016
33
“The EF 75-300 mm lens is only good to be used outside. The 2.2X
HD lens can only be used for specific items; filters are useless if
ISO, AP,... . The short 18-55mm lens is cheap and should have a
hood to keep light off lens.”
[Oct, 2012]
Illustrative Example for Review Communities
● Consider following camera reviews by John:
“My first DSLR. Excellent camera, takes great pictures with high
definition, without a doubt it makes honor to its name.”
[Aug, 1997]
How can we quantify this change
in users’ maturity / experience ?
How can we model this evolution
/ progression in users’ maturity?
Mukherjee et al.: ICDM 2015, SIGKDD 2016
Prior Work: Discrete Experience Evolution
34
Assumption: At each timepoint a user
remains at the same level of experience, or
moves to the next level
1. Users at similar levels of experience
have similar facet preferences, and
rating style (McAuley and Leskovec:
WWW 2013)
2. Additionally, our work exploits
similar writing style (Mukherjee,
Lamba and Weikum: ICDM 2015)
Language Model (KL) Divergence Increases with Experience
35Experienced users have a distinctive writing style different than that of amateurs
1. Users at similar levels of experience
have similar facet preferences, and
rating style (McAuley and Leskovec,
WWW 2013)
2. Additionally, our work exploits
similar writing style (Mukherjee,
Lamba and Weikum, ICDM 2015)
Prior Work: Discrete Experience Evolution
36
Assumption: At each timepoint a user
remains at the same level of experience, or
moves to the next level
Abrupt Transition
Continuous Experience Evolution
(Mukherjee, Günnemann and Weikum, SIGKDD 2016)
37
Continuous Experience Evolution: Assumptions
★ Continuous-time process, always positive
★ Markovian assumption: Experience at time t depends on that at t-1
★ Drift: Overall trend to increase over time
★ Volatility: Progression may not be smooth with occasional volatility
E.g.: series of expert reviews followed by a sloppy one
38
Geometric Brownian Motion
39
We show these properties to be satisfied by the continuous-time
stochastic process: Geometric Brownian Motion
Language Model (LM) Evolution
● Users' LM also evolve with experience evolution
● Smoothly evolve over time preserving Markov property of experience
evolution
● Variance of LM should change with experience change
● Brownian Motion to model this desiderata:
40
Inference
41
Topic Model (Blei et
al., JMLR '03)
Users ( Author-topic
model,Rosen-Zvi et
al., UAI '04)
Continuous Time
(Dynamic topic model,
Wang et al., UAI '08)
Continuous
Experience
(this work)
+
+
+
Sampling based Inference
Kalman Filter for LM
evolution
Metropolis Hastings
for Exp. evolution
42
Gibbs Sampling for Facets
Kalman Filter for LM
evolution
Metropolis Hastings
for Exp. evolution
43
Gibbs Sampling for Facets
Sampling based Inference
Dataset Statistics
44
Can we recommend items better, if we consider
users’ experience to consume them?
45
Log-likelihood, Smoothness, and Convergence
46
Interpretability: Top Words* by Experienced Users
47
Most Experience Least Experience
BeerAdvocate chestnut_hued near_viscous cherry_wood
sweet_burning faint_vanilla woody_herbal
citrus_hops mouthfeel
originally flavor color poured
pleasant bad bitter sweet
Amazon aficionados minimalist underwritten
theatrically unbridled seamless retrospect
overdramatic
viewer entertainment battle actress
tells emotional supporting
Yelp smoked marinated savory signature
contemporary selections delicate texture
mexican chicken salad love better
eat atmosphere sandwich
NewsTrust health actions cuts medicare oil climate
spending unemployment
bad god religion iraq responsibility
questions clear powerful
*Learned by our generative model without supervision
Interpretability: Top Words* by Experienced Users
48
Most Experience Least Experience
BeerAdvocate chestnut_hued near_viscous cherry_wood
sweet_burning faint_vanilla woody_herbal
citrus_hops mouthfeel
originally flavor color poured
pleasant bad bitter sweet
Amazon aficionados minimalist underwritten
theatrically unbridled seamless retrospect
overdramatic
viewer entertainment battle actress
tells emotional supporting
Yelp smoked marinated savory signature
contemporary selections delicate texture
mexican chicken salad love better
eat atmosphere sandwich
NewsTrust health actions cuts medicare oil climate
spending unemployment
bad god religion iraq responsibility
questions clear powerful
*Learned by our generative model without supervision
Experienced users in the beer
community use more “fruity” words
to describe taste and smell of beers
Experienced users in the news
community discuss about policies and
regulations in contrast to amateurs
interested on polarizing topics
Take-away
● Insights from Geometric Brownian Motion trajectory of users:
○ Experienced users mature faster than amateurs
○ Progression depends more on time spent in community than on activity
● Users' experience evolve continuously, along with language usage
● Recommendation models can be improved by considering users’ maturity
● Learns from only the information of users reviewing products at explicit timepoints ---
no meta-data, community-specific / platform dependent features --- easy to
generalize across different communities 49
Outline
● Motivation
● Prior Work and its Limitations
● Credibility Analysis
↘ Framework for Online
Communities
↘ Temporal Evolution of Online
Communities
↘ Credibility Analysis of
Product Reviews
● Conclusions
50
Can we use this
framework to find
helpful product
reviews?
● Reviews (e.g., camera) with similar
facet-sentiment distribution (e.g.,
bashing “zoom” and “resolution”) are
likely to be equally helpful.
Distributional Hypotheses
Subhabrata Mukherjee, Kashyap Popat, Gerhard Weikum: SDM 2017 51
● Users with similar facet preferences and
expertise are likely to be equally helpful.
We analyze consistency of
embeddings from previous
models to detect fake /
anomalous reviews with
discrepancies like:
Subhabrata Mukherjee, Sourav Dutta, Gerhard Weikum: ECML-PKDD 2016
1. Rating and review description
(promotion/demotion)
Excellent product... technical support is almost non-existent ...
this is unacceptable. [4]
2. Rating and Facet description (irrelevant)
DO NOT BUY THIS. I can’t file because Turbo Tax doesn’t have
software updates from the IRS “because of Hurricane Katrina”. [1]
3. Temporal bursts (group spamming)
Dan’s apartment was beautiful, a great location. (3/14/2012)[5]
I highly recommend working with Dan and... (3/14/2012) [5]
Dan is super friendly, confident... (3/14/2012) [4]
Consistency Analysis
of Product Reviews
52
Future Work
53
★ Going beyond topics and bag-of-words features / lexicons
Learning linguistic cues from embeddings
★ Applications to tasks like Anomaly Detection, Community
Question-Answering, Knowledge-base Curation etc.
★ Incorporating richer facets like multi-modal interactions,
stance, influence evolution etc.
1. How can we develop models that jointly
leverage users, network, and context for
credibility analysis in online communities?
2. How can we model users’ evolution or
progression in maturity?
3. How can we deal with the limited
information scenario?
4. How can we generate interpretable
explanations for credibility verdict?
54
Interactional Framework for
Credibility Analysis
Conclusions
1. How can we jointly leverage users,
network, and context for credibility
analysis in online communities?
2. How can we model users’ evolution?
3. How can we deal with limited data?
4. How can we generate interpretable
explanations for credibility verdict?
55
Collaborators and Co-authors
Gerhard
Weikum
(Advisor)
Kashyap
Popat
Jannik
Strötgen
Cristian
Danescu-Nicul
escu-Mizil
Stephan
Günnemann
Hemank
Lamba
Sourav
Dutta
Acknowledgments (1/3)
56
Dissertation Committee
Gerhard
Weikum
Jiawei
Han
Stephan
Günnemann
Acknowledgments (2/3)
Dietrich
Klakow
57
Databases and Information Systems Department at Max Planck Institute
Acknowledgments (3/3)
1. How can we develop models that jointly
leverage users, network, and context for
credibility analysis in online communities?
2. How can we model users’ evolution or
progression in maturity?
3. How can we deal with the limited
information scenario?
4. How can we generate interpretable
explanations for credibility verdict?
58
Interactional Framework for
Credibility Analysis
1. How can we jointly leverage users,
network, and context for credibility
analysis in online communities?
2. How can we model users’ evolution?
3. How can we deal with limited data?
4. How can we generate interpretable
explanations for credibility verdict?
THANKS!

Mais conteúdo relacionado

Semelhante a Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities

IRJET- Analyzing Sentiments in One Go
IRJET-  	  Analyzing Sentiments in One GoIRJET-  	  Analyzing Sentiments in One Go
IRJET- Analyzing Sentiments in One GoIRJET Journal
 
An Unsupervised Approach For Reputation Generation
An Unsupervised Approach For Reputation GenerationAn Unsupervised Approach For Reputation Generation
An Unsupervised Approach For Reputation GenerationKayla Jones
 
Hybrid Personalized Recommender System Using Modified Fuzzy C-Means Clusterin...
Hybrid Personalized Recommender System Using Modified Fuzzy C-Means Clusterin...Hybrid Personalized Recommender System Using Modified Fuzzy C-Means Clusterin...
Hybrid Personalized Recommender System Using Modified Fuzzy C-Means Clusterin...Waqas Tariq
 
Udacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsUdacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsAxel de Romblay
 
Human-centered AI: how can we support lay users to understand AI?
Human-centered AI: how can we support lay users to understand AI?Human-centered AI: how can we support lay users to understand AI?
Human-centered AI: how can we support lay users to understand AI?Katrien Verbert
 
Webinar Agile Presentation V.1.0
Webinar Agile Presentation V.1.0Webinar Agile Presentation V.1.0
Webinar Agile Presentation V.1.0fhios
 
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’S
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’SDiscovering Influential User by Coupling Multiplex Heterogeneous OSN’S
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’SIRJET Journal
 
IRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET- Enhancing NLP Techniques for Fake Review DetectionIRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET- Enhancing NLP Techniques for Fake Review DetectionIRJET Journal
 
CAMUG - Sept 3, 2020 - User Story Quality Matters
CAMUG - Sept 3, 2020 - User Story Quality MattersCAMUG - Sept 3, 2020 - User Story Quality Matters
CAMUG - Sept 3, 2020 - User Story Quality MattersChris Edwards, P.Eng.
 
Book recommendation system using opinion mining technique
Book recommendation system using opinion mining techniqueBook recommendation system using opinion mining technique
Book recommendation system using opinion mining techniqueeSAT Journals
 
[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation SystemsAxel de Romblay
 
ASA conference Feb 2013
ASA conference Feb 2013ASA conference Feb 2013
ASA conference Feb 2013mrkwr
 
IRJET- Opinion Mining from Customer Reviews for Predicting Competitors
IRJET- Opinion Mining from Customer Reviews for Predicting CompetitorsIRJET- Opinion Mining from Customer Reviews for Predicting Competitors
IRJET- Opinion Mining from Customer Reviews for Predicting CompetitorsIRJET Journal
 
IRJET- Hybrid Book Recommendation System
IRJET- Hybrid Book Recommendation SystemIRJET- Hybrid Book Recommendation System
IRJET- Hybrid Book Recommendation SystemIRJET Journal
 
Exploration & Promotion: Implementation Strategies of Corporate Social Software
Exploration & Promotion: Implementation Strategies of Corporate Social SoftwareExploration & Promotion: Implementation Strategies of Corporate Social Software
Exploration & Promotion: Implementation Strategies of Corporate Social SoftwareAlexander Stocker
 
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...IRJET Journal
 
TWITTER SENTIMENT ANALYSIS
TWITTER SENTIMENT ANALYSISTWITTER SENTIMENT ANALYSIS
TWITTER SENTIMENT ANALYSISIRJET Journal
 
TWITTER SENTIMENT ANALYSIS
TWITTER SENTIMENT ANALYSISTWITTER SENTIMENT ANALYSIS
TWITTER SENTIMENT ANALYSISIRJET Journal
 
Online review mining for forecasting sales
Online review mining for forecasting salesOnline review mining for forecasting sales
Online review mining for forecasting saleseSAT Publishing House
 
Online review mining for forecasting sales
Online review mining for forecasting salesOnline review mining for forecasting sales
Online review mining for forecasting saleseSAT Journals
 

Semelhante a Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities (20)

IRJET- Analyzing Sentiments in One Go
IRJET-  	  Analyzing Sentiments in One GoIRJET-  	  Analyzing Sentiments in One Go
IRJET- Analyzing Sentiments in One Go
 
An Unsupervised Approach For Reputation Generation
An Unsupervised Approach For Reputation GenerationAn Unsupervised Approach For Reputation Generation
An Unsupervised Approach For Reputation Generation
 
Hybrid Personalized Recommender System Using Modified Fuzzy C-Means Clusterin...
Hybrid Personalized Recommender System Using Modified Fuzzy C-Means Clusterin...Hybrid Personalized Recommender System Using Modified Fuzzy C-Means Clusterin...
Hybrid Personalized Recommender System Using Modified Fuzzy C-Means Clusterin...
 
Udacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsUdacity webinar on Recommendation Systems
Udacity webinar on Recommendation Systems
 
Human-centered AI: how can we support lay users to understand AI?
Human-centered AI: how can we support lay users to understand AI?Human-centered AI: how can we support lay users to understand AI?
Human-centered AI: how can we support lay users to understand AI?
 
Webinar Agile Presentation V.1.0
Webinar Agile Presentation V.1.0Webinar Agile Presentation V.1.0
Webinar Agile Presentation V.1.0
 
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’S
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’SDiscovering Influential User by Coupling Multiplex Heterogeneous OSN’S
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’S
 
IRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET- Enhancing NLP Techniques for Fake Review DetectionIRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET- Enhancing NLP Techniques for Fake Review Detection
 
CAMUG - Sept 3, 2020 - User Story Quality Matters
CAMUG - Sept 3, 2020 - User Story Quality MattersCAMUG - Sept 3, 2020 - User Story Quality Matters
CAMUG - Sept 3, 2020 - User Story Quality Matters
 
Book recommendation system using opinion mining technique
Book recommendation system using opinion mining techniqueBook recommendation system using opinion mining technique
Book recommendation system using opinion mining technique
 
[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems
 
ASA conference Feb 2013
ASA conference Feb 2013ASA conference Feb 2013
ASA conference Feb 2013
 
IRJET- Opinion Mining from Customer Reviews for Predicting Competitors
IRJET- Opinion Mining from Customer Reviews for Predicting CompetitorsIRJET- Opinion Mining from Customer Reviews for Predicting Competitors
IRJET- Opinion Mining from Customer Reviews for Predicting Competitors
 
IRJET- Hybrid Book Recommendation System
IRJET- Hybrid Book Recommendation SystemIRJET- Hybrid Book Recommendation System
IRJET- Hybrid Book Recommendation System
 
Exploration & Promotion: Implementation Strategies of Corporate Social Software
Exploration & Promotion: Implementation Strategies of Corporate Social SoftwareExploration & Promotion: Implementation Strategies of Corporate Social Software
Exploration & Promotion: Implementation Strategies of Corporate Social Software
 
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
 
TWITTER SENTIMENT ANALYSIS
TWITTER SENTIMENT ANALYSISTWITTER SENTIMENT ANALYSIS
TWITTER SENTIMENT ANALYSIS
 
TWITTER SENTIMENT ANALYSIS
TWITTER SENTIMENT ANALYSISTWITTER SENTIMENT ANALYSIS
TWITTER SENTIMENT ANALYSIS
 
Online review mining for forecasting sales
Online review mining for forecasting salesOnline review mining for forecasting sales
Online review mining for forecasting sales
 
Online review mining for forecasting sales
Online review mining for forecasting salesOnline review mining for forecasting sales
Online review mining for forecasting sales
 

Mais de Subhabrata Mukherjee

XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsXtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsSubhabrata Mukherjee
 
OpenTag: Open Attribute Value Extraction From Product Profiles
OpenTag: Open Attribute Value Extraction From Product ProfilesOpenTag: Open Attribute Value Extraction From Product Profiles
OpenTag: Open Attribute Value Extraction From Product ProfilesSubhabrata Mukherjee
 
Continuous Experience-aware Language Model
Continuous Experience-aware Language ModelContinuous Experience-aware Language Model
Continuous Experience-aware Language ModelSubhabrata Mukherjee
 
Experience aware Item Recommendation in Evolving Review Communities
Experience aware Item Recommendation in Evolving Review CommunitiesExperience aware Item Recommendation in Evolving Review Communities
Experience aware Item Recommendation in Evolving Review CommunitiesSubhabrata Mukherjee
 
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...Subhabrata Mukherjee
 
Leveraging Joint Interactions for Credibility Analysis in News Communities
Leveraging Joint Interactions for Credibility Analysis in News CommunitiesLeveraging Joint Interactions for Credibility Analysis in News Communities
Leveraging Joint Interactions for Credibility Analysis in News CommunitiesSubhabrata Mukherjee
 
People on Drugs: Credibility of User Statements in Health Forums
People on Drugs: Credibility of User Statements in Health ForumsPeople on Drugs: Credibility of User Statements in Health Forums
People on Drugs: Credibility of User Statements in Health ForumsSubhabrata Mukherjee
 
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...Subhabrata Mukherjee
 
Joint Author Sentiment Topic Model
Joint Author Sentiment Topic ModelJoint Author Sentiment Topic Model
Joint Author Sentiment Topic ModelSubhabrata Mukherjee
 
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterTwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterSubhabrata Mukherjee
 
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...Subhabrata Mukherjee
 
Leveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word SimilarityLeveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word SimilaritySubhabrata Mukherjee
 
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...Subhabrata Mukherjee
 
Feature specific analysis of reviews
Feature specific analysis of reviewsFeature specific analysis of reviews
Feature specific analysis of reviewsSubhabrata Mukherjee
 
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...Subhabrata Mukherjee
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSubhabrata Mukherjee
 

Mais de Subhabrata Mukherjee (17)

XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsXtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
 
Fact Checking from Text
Fact Checking from TextFact Checking from Text
Fact Checking from Text
 
OpenTag: Open Attribute Value Extraction From Product Profiles
OpenTag: Open Attribute Value Extraction From Product ProfilesOpenTag: Open Attribute Value Extraction From Product Profiles
OpenTag: Open Attribute Value Extraction From Product Profiles
 
Continuous Experience-aware Language Model
Continuous Experience-aware Language ModelContinuous Experience-aware Language Model
Continuous Experience-aware Language Model
 
Experience aware Item Recommendation in Evolving Review Communities
Experience aware Item Recommendation in Evolving Review CommunitiesExperience aware Item Recommendation in Evolving Review Communities
Experience aware Item Recommendation in Evolving Review Communities
 
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...
 
Leveraging Joint Interactions for Credibility Analysis in News Communities
Leveraging Joint Interactions for Credibility Analysis in News CommunitiesLeveraging Joint Interactions for Credibility Analysis in News Communities
Leveraging Joint Interactions for Credibility Analysis in News Communities
 
People on Drugs: Credibility of User Statements in Health Forums
People on Drugs: Credibility of User Statements in Health ForumsPeople on Drugs: Credibility of User Statements in Health Forums
People on Drugs: Credibility of User Statements in Health Forums
 
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
 
Joint Author Sentiment Topic Model
Joint Author Sentiment Topic ModelJoint Author Sentiment Topic Model
Joint Author Sentiment Topic Model
 
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterTwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
 
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
 
Leveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word SimilarityLeveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word Similarity
 
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
 
Feature specific analysis of reviews
Feature specific analysis of reviewsFeature specific analysis of reviews
Feature specific analysis of reviews
 
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
 

Último

Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 

Último (20)

Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 

Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities

  • 1. Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities Subhabrata Mukherjee Max Planck Institute for Informatics, Germany smukherjee@mpi-inf.mpg.de
  • 2. Outline ● Motivation ● Prior Work and its Limitations ● Credibility Analysis ↘ Framework for Online Communities ↘ Temporal Evolution of Online Communities ↘ Credibility Analysis of Product Reviews ● Conclusions 2
  • 3. Online Communities as a Knowledge Resource ● Online communities are massive repositories of knowledge accessed by regular users and professionals ↘ 59% of adult U.S. population and half of U.S. physicians rely on online resources [IMS Health Report, 2014] ↘ 40% of online consumers consult online reviews before buying products [Nielson Corporation, 2016] ● However their usability is restricted due to serious credibility concerns (e.g., spams, misinformation, bias etc.)
  • 4. Concerns 4 Misinformation for health can have hazardous consequences “Rapid spread of misinformation online” --- one of top 10 challenges as per The World Economic Forum
  • 5. Outline ● Motivation ● Prior Work and its Limitations ● Credibility Analysis ↘ Framework for Online Communities ↘ Temporal Evolution of Online Communities ↘ Credibility Analysis of Product Reviews ● Conclusions 5
  • 6. Truth Finding Structured data (e.g., SPO triples, tables, networks) Objective facts (e.g., Obama_BornIn_Hawaii vs. Obama_BornIn_Kenya) No contextual data (text) No external KB, metadata 6 Linguistic Analysis Unstructured text Subjective information (e.g., opinion spam, bias, viewpoint ) External KB (e.g., WordNet, KG) No network / interactions, metadata
  • 7. 1. How can we jointly leverage users, network, and context for credibility analysis in online communities? 2. How can we model users’ evolution? 3. How can we deal with limited data? 4. How can we generate interpretable explanations for credibility verdict? 7 Research Questions
  • 8. Contributions ● Credibility Analysis Framework for Online Communities ↘ Classification: Health Communities [SIGKDD 2014] ↘ Regression: News Communities [CIKM 2015] ● Temporal Evolution of Online Communities [ICDM 2015, SIGKDD 2016] ● Credibility Analysis of Product Reviews [ECML-PKDD 2016, SDM 2017] 8
  • 9. Outline ● Motivation ● Prior Work and its Limitations ● Credibility Analysis ↘ Framework for Online Communities ↘ Temporal Evolution of Online Communities ↘ Credibility Analysis of Product Reviews ● Conclusions 9
  • 10. What is Credibility? 10 “A statement is credible if it is reported by a trustworthy user in an objective language” “Trustworthy users corroborate each other on credible statements”
  • 11. Credibility Analysis Framework for Classification Problem: Given a set of posts from different users, extract credible statements (subject-predicate-object triples like DrugX_HasSideEffect_Y) from trustworthy users 11Subhabrata Mukherjee, Gerhard Weikum and Cristian Danescu-Niculescu-Mizil: SIGKDD 2014
  • 12. Credibility Analysis Framework for Classification Subhabrata Mukherjee, Gerhard Weikum and Cristian Danescu-Niculescu-Mizil: SIGKDD 2014 12 Problem: Given a set of posts from different users, extract credible statements (subject-predicate-object triples like DrugX_HasSideEffect_Y) from trustworthy users
  • 13. Network of Interactions: Cliques Statements: An IE tool generates candidate triple patterns like: Xanax_causes_headache, Xanax_gave_demonic-feel Potentially thousands of such triples, with only a handful of credible ones ➔ Each user, post, and statement is a random variable with edges depicting interactions. Variables have observable features (e.g, authority, emotionality). ➔ A clique is formed between each user writing a post containing a statement. 13
  • 14. Network of Interactions: Cliques Idea: Trustworthy users corroborate on credible statements in objective language Statements: An IE tool generates candidate triple patterns like: Xanax_causes_headache, Xanax_gave_demonic-feel Potentially thousands of such triples, with only a handful of credible ones Each user, post, and statement is a random variable with edges depicting interactions 14
  • 15. 15 Conditional Random Field to Exploit Joint Interactions (Users + Network + Context) Partial Supervision: Expert stated (top 20%) side-effects of drugs as partial training labels. Model predicts labels of unobserved statements. How to complement expert medical knowledge with large scale non-expert data?
  • 16. Semi-Supervised Conditional Random Field 1. Estimate user trustworthiness: 2. Estimate label of unknown statements Su by Gibbs Sampling: 3. Maximize log-likelihood to estimate feature weights: 4. Apply E-Step and M-Step till convergence 16
  • 17. Healthforum Dataset ● Healthboards.com community (www.healthboards.com) with 850,000 registered users and 4.5 million posts 17 ● Expert labels about drugs from MayoClinic (www.mayoclinic.org) ↘ 6 widely used drugs for experimentation
  • 18. 18
  • 19. 19 What constitutes credible language? Affective Emotions confidence sympathy self-esteem eagerness coolness compunction anxiety embarrassment misery distress
  • 20. 20 What constitutes credible language? determiner (this, that,..) negation (not, never, ..) second person (you, ..) conjunction (therefore, consequently, ..) contrast (despite, though, ..) question (what, why, ..) conditional (if) adverb (maybe, probably, ..) modality (might, could, ..) Discourse and Modalities
  • 21. 21 Credibility Analysis Framework for Regression In many online communities users rate items on their quality
  • 22. Credibility Analysis in News Communities Topics Climate Change Sources trunews.com Articles “Global warming is a hoax” Sources / Users Scientificamerican.com snopes.com user-donald Reviews & Ratings scientific analysis, 1.5/ 5, conspiratory theory 22 However, user feedback is often subjective; influenced by their bias and viewpoints
  • 23. Reviews / Ratings scientific analysis, 1.5/ 5, conspiratory theory Topics Climate Change Articles “Global warming is a hoax” Sources trunews.com Sources / Users Scientificamerican.com snopes.com user-donald Idea: Trustworthy sources publish objective articles corroborated by expert users with credible reviews/ratings 23 We use CRF to capture these mutual interactions in news communities (e.g., newstrust.net, digg, reddit) to jointly rank all of the underlying factors. Credibility Analysis Framework for Regression
  • 24. Online Communities: Factors Related to Ensemble Learning, Learning to Rank
  • 25. How to incorporate continuous ratings instead of discrete labels in CRF ? 25Subhabrata Mukherjee and Gerhard Weikum: CIKM 2015 Probability Mass Function for discrete labels: Probability Density Function for continuous ratings:
  • 26. Energy Function to Combine All
  • 27. How to incorporate continuous ratings instead of discrete labels in CRF ? 27 ● We show that a certain energy function for clique potential --- geared for reducing mean-squared-error --- results in multivariate gaussian p.d.f. !!! ● Constrained Gradient Ascent for inference Subhabrata Mukherjee and Gerhard Weikum: CIKM 2015
  • 28. Predicting Article Credibility Ratings in Newstrust.net 28 Progressive decrease in mean squared error with more network interactions, and context
  • 29. Take-away ● Semi-supervised and Continuous CRF to jointly identify trustworthy users, credible statements, and reliable postings in online communities ● A framework to incorporate richer aspects like user expertise, topics / facets, temporal evolution etc. 29
  • 30. Outline ● Motivation ● Prior Work and its Limitations ● Credibility Analysis ↘ Framework for Online Communities ↘ Temporal Evolution of Online Communities ↘ Credibility Analysis of Product Reviews ● Conclusions 30
  • 31. ● Online communities are dynamic, as users join and leave; acquire new vocabulary; evolve and mature over time ● Trustworthiness and expertise of users evolve over time Temporal Evolution 31 How to capture evolving user expertise?
  • 32. “My first DSLR. Excellent camera, takes great pictures with high definition, without a doubt it makes honor to its name.” [Aug, 1997] “The EF 75-300 mm lens is only good to be used outside. The 2.2X HD lens can only be used for specific items; filters are useless if ISO, AP,... . The short 18-55mm lens is cheap and should have a hood to keep light off lens.” [Oct, 2012] Illustrative Example for Review Communities 32 ● Consider following camera reviews by the same user John: Mukherjee et al.: ICDM 2015, SIGKDD 2016
  • 33. 33 “The EF 75-300 mm lens is only good to be used outside. The 2.2X HD lens can only be used for specific items; filters are useless if ISO, AP,... . The short 18-55mm lens is cheap and should have a hood to keep light off lens.” [Oct, 2012] Illustrative Example for Review Communities ● Consider following camera reviews by John: “My first DSLR. Excellent camera, takes great pictures with high definition, without a doubt it makes honor to its name.” [Aug, 1997] How can we quantify this change in users’ maturity / experience ? How can we model this evolution / progression in users’ maturity? Mukherjee et al.: ICDM 2015, SIGKDD 2016
  • 34. Prior Work: Discrete Experience Evolution 34 Assumption: At each timepoint a user remains at the same level of experience, or moves to the next level 1. Users at similar levels of experience have similar facet preferences, and rating style (McAuley and Leskovec: WWW 2013) 2. Additionally, our work exploits similar writing style (Mukherjee, Lamba and Weikum: ICDM 2015)
  • 35. Language Model (KL) Divergence Increases with Experience 35Experienced users have a distinctive writing style different than that of amateurs
  • 36. 1. Users at similar levels of experience have similar facet preferences, and rating style (McAuley and Leskovec, WWW 2013) 2. Additionally, our work exploits similar writing style (Mukherjee, Lamba and Weikum, ICDM 2015) Prior Work: Discrete Experience Evolution 36 Assumption: At each timepoint a user remains at the same level of experience, or moves to the next level Abrupt Transition
  • 37. Continuous Experience Evolution (Mukherjee, Günnemann and Weikum, SIGKDD 2016) 37
  • 38. Continuous Experience Evolution: Assumptions ★ Continuous-time process, always positive ★ Markovian assumption: Experience at time t depends on that at t-1 ★ Drift: Overall trend to increase over time ★ Volatility: Progression may not be smooth with occasional volatility E.g.: series of expert reviews followed by a sloppy one 38
  • 39. Geometric Brownian Motion 39 We show these properties to be satisfied by the continuous-time stochastic process: Geometric Brownian Motion
  • 40. Language Model (LM) Evolution ● Users' LM also evolve with experience evolution ● Smoothly evolve over time preserving Markov property of experience evolution ● Variance of LM should change with experience change ● Brownian Motion to model this desiderata: 40
  • 41. Inference 41 Topic Model (Blei et al., JMLR '03) Users ( Author-topic model,Rosen-Zvi et al., UAI '04) Continuous Time (Dynamic topic model, Wang et al., UAI '08) Continuous Experience (this work) + + +
  • 42. Sampling based Inference Kalman Filter for LM evolution Metropolis Hastings for Exp. evolution 42 Gibbs Sampling for Facets
  • 43. Kalman Filter for LM evolution Metropolis Hastings for Exp. evolution 43 Gibbs Sampling for Facets Sampling based Inference
  • 45. Can we recommend items better, if we consider users’ experience to consume them? 45
  • 47. Interpretability: Top Words* by Experienced Users 47 Most Experience Least Experience BeerAdvocate chestnut_hued near_viscous cherry_wood sweet_burning faint_vanilla woody_herbal citrus_hops mouthfeel originally flavor color poured pleasant bad bitter sweet Amazon aficionados minimalist underwritten theatrically unbridled seamless retrospect overdramatic viewer entertainment battle actress tells emotional supporting Yelp smoked marinated savory signature contemporary selections delicate texture mexican chicken salad love better eat atmosphere sandwich NewsTrust health actions cuts medicare oil climate spending unemployment bad god religion iraq responsibility questions clear powerful *Learned by our generative model without supervision
  • 48. Interpretability: Top Words* by Experienced Users 48 Most Experience Least Experience BeerAdvocate chestnut_hued near_viscous cherry_wood sweet_burning faint_vanilla woody_herbal citrus_hops mouthfeel originally flavor color poured pleasant bad bitter sweet Amazon aficionados minimalist underwritten theatrically unbridled seamless retrospect overdramatic viewer entertainment battle actress tells emotional supporting Yelp smoked marinated savory signature contemporary selections delicate texture mexican chicken salad love better eat atmosphere sandwich NewsTrust health actions cuts medicare oil climate spending unemployment bad god religion iraq responsibility questions clear powerful *Learned by our generative model without supervision Experienced users in the beer community use more “fruity” words to describe taste and smell of beers Experienced users in the news community discuss about policies and regulations in contrast to amateurs interested on polarizing topics
  • 49. Take-away ● Insights from Geometric Brownian Motion trajectory of users: ○ Experienced users mature faster than amateurs ○ Progression depends more on time spent in community than on activity ● Users' experience evolve continuously, along with language usage ● Recommendation models can be improved by considering users’ maturity ● Learns from only the information of users reviewing products at explicit timepoints --- no meta-data, community-specific / platform dependent features --- easy to generalize across different communities 49
  • 50. Outline ● Motivation ● Prior Work and its Limitations ● Credibility Analysis ↘ Framework for Online Communities ↘ Temporal Evolution of Online Communities ↘ Credibility Analysis of Product Reviews ● Conclusions 50
  • 51. Can we use this framework to find helpful product reviews? ● Reviews (e.g., camera) with similar facet-sentiment distribution (e.g., bashing “zoom” and “resolution”) are likely to be equally helpful. Distributional Hypotheses Subhabrata Mukherjee, Kashyap Popat, Gerhard Weikum: SDM 2017 51 ● Users with similar facet preferences and expertise are likely to be equally helpful.
  • 52. We analyze consistency of embeddings from previous models to detect fake / anomalous reviews with discrepancies like: Subhabrata Mukherjee, Sourav Dutta, Gerhard Weikum: ECML-PKDD 2016 1. Rating and review description (promotion/demotion) Excellent product... technical support is almost non-existent ... this is unacceptable. [4] 2. Rating and Facet description (irrelevant) DO NOT BUY THIS. I can’t file because Turbo Tax doesn’t have software updates from the IRS “because of Hurricane Katrina”. [1] 3. Temporal bursts (group spamming) Dan’s apartment was beautiful, a great location. (3/14/2012)[5] I highly recommend working with Dan and... (3/14/2012) [5] Dan is super friendly, confident... (3/14/2012) [4] Consistency Analysis of Product Reviews 52
  • 53. Future Work 53 ★ Going beyond topics and bag-of-words features / lexicons Learning linguistic cues from embeddings ★ Applications to tasks like Anomaly Detection, Community Question-Answering, Knowledge-base Curation etc. ★ Incorporating richer facets like multi-modal interactions, stance, influence evolution etc.
  • 54. 1. How can we develop models that jointly leverage users, network, and context for credibility analysis in online communities? 2. How can we model users’ evolution or progression in maturity? 3. How can we deal with the limited information scenario? 4. How can we generate interpretable explanations for credibility verdict? 54 Interactional Framework for Credibility Analysis Conclusions 1. How can we jointly leverage users, network, and context for credibility analysis in online communities? 2. How can we model users’ evolution? 3. How can we deal with limited data? 4. How can we generate interpretable explanations for credibility verdict?
  • 57. 57 Databases and Information Systems Department at Max Planck Institute Acknowledgments (3/3)
  • 58. 1. How can we develop models that jointly leverage users, network, and context for credibility analysis in online communities? 2. How can we model users’ evolution or progression in maturity? 3. How can we deal with the limited information scenario? 4. How can we generate interpretable explanations for credibility verdict? 58 Interactional Framework for Credibility Analysis 1. How can we jointly leverage users, network, and context for credibility analysis in online communities? 2. How can we model users’ evolution? 3. How can we deal with limited data? 4. How can we generate interpretable explanations for credibility verdict? THANKS!