SlideShare uma empresa Scribd logo
1 de 45
Pavan Kapanipathi*, Prateek Jain^,
Chitra Venkataramani^, Amit Sheth*
*Kno.e.sis Center, Wright State University
^IBM TJ Watson Research Center
1
#eswc2014Kapanipathi
 Motivation
 Background
 Approach
 Evaluation
 Conclusion & Future Work
2
Motivation
 Approach
 Evaluation
 Conclusion & Future Work
3
 Tapping into Social Networks to identify
interests is not new (2006+). It works!!
◦ Google, Bing, Samsung TV etc.
 Twitter Content
◦ 500M+ Users generating 500M+ tweets per day.
◦ Public and useful for research
4
 Interests with lesser or no semantics
◦ Bag of Words [1]
◦ Bag of Concepts
 Some Semantics
◦ Bag of Linked Entities with intentions of using
Knowledge Bases. [2, 3]
5
1. Alan Mislove, Bimal Viswanath, Krishna P. Gummadi, and Peter Druschel. You Are Who You Know: Inferring User
Profiles in Online Social Networks. WSDM ’10.
2. Fabian Abel, Qi Gao, Geert-Jan Houben, and Ke Tao. Analyzing User Modeling on Twitter for Personalized News
Recommendations. UMAP ’11
3. Fabrizio Orlandi, John Breslin, and Alexandre Passant. Aggregated, Interoperable and Multi-domain User Profiles
for the Social Web. I-SEMANTICS ’12.
6
 How can Semantics/Knowledge Bases be
utilized to infer interests?
◦ Extensive use of Knowledge Bases to infer user
interests from Tweets is yet to be explored.
 First we started with utilizing Hierarchical
Relationships
7
Internet
Semantic
Search
Linked
Data
Metadata
Technology
World
Wide Web
Semantic
Web
Entities
Structured
Information
8
 Addressing Data Sparcity Problem
◦ Infer more interests of the users with lesser data.
 Flexibility for Recommendations
◦ Recommend about Sports or Football
 KB knows that Football is a sub-category of Sports
◦ Resource Description Framework and Semantic Web
 RDF has lesser data online to recommend.
9
 Motivation
Approach
 Evaluation
 Conclusion & Future Work
10
11
Tweets
Interest Hierarchy
12
Tweets
Interest Hierarchy
 Selecting an Ontology
◦ Available: Wikipedia, Dmoz, OpenCyc, Freebase
◦ Our framework can adapt to any ontology
 Wikipedia
◦ Diverse Domains & Coverage
◦ Resemblance to a Taxonomy
◦ Extracted Structured Wikipedia – Dbpedia
◦ Existing entity recognition techniques (Explained
further)
13
 4.2 Million Articles
 0.8 Million Wikipedia Categories
 2.0 Million Category-Subcategory
relationships
 Challenges
◦ Since crowd-sourced – Noisy
◦ Not a hierarchy/taxonomy
 It is a graph
 It has cycles
14
 Clean up -- Removed Wiki Admin Categories
 Hierarchical Interest Graph needs a Base
Hierarchy
◦ Shortest Path from the root node
 Root Node: Category:Main Topic Classifications
 Assumption – Hops to the root node determines the
level of abstraction of the category.
15
16
Agriculture Science
Science
Education
Scientists
Main topic
classifications
Sports Health
Health
Care
Health
Economics
Level: 1
Level: 2
Level: 3
 Removing Links that does not concur to a
hierarchy
17
18
Tweets
Interest Hierarchy
 Extracting Wikipedia concepts from Tweets
 Interests Scoring
19
http://en.wikipedia.org/wiki/Semantic_search
http://en.wikipedia.org/wiki/Ontology
◦ Issues relevant to entity extraction are handled by
the web services
 Stop words removal, URLs, Disambiguation etc.
20
Precision Recall F-measure Usability Rate Limit
License
Dbpedia
Spotlight
20.1 47.5 28.3 Inhouse+Web
Service
N/A
Apache 2.0
Text Razor 64.6 26.9 38.0 Web Service 500/day
Zemanta 57.7 31.8 41.0 Web Service 10000/day
*L. Derczynski, D. Maynard, N. Aswani, and K. Bontcheva. Microblog-genre noise and impact on semantic annotation accuracy.
In Proceedings of the 24th ACM Conference on Hypertext and Social Media, HT ’13.
 Scoring Wikipedia concepts
21
Internet
Semantic
Search
Linked
Data
Metadata
Technology
World Wide Web
Semantic
Web
User
Interests
Structured
Information
0.8 0.2 0.6
Scores for
Interests
22
23
Tweets
Interest Hierarchy
 Result (Challenges)
◦ Infer more categories
without context
◦ Equal weights regardless
Interest Score
◦ Cannot rank categories of
Interest for a user
◦ We use Spreading
Activation
24
Cricket
M S
Dhoni
Virat
Kohli
Sachin
Tendulkar
Sports
Indian
Cricket
Indian
Cricketers
Honorary
Members of
the Order of
Australia
Order of
Australia
Awards
Culture
 Graph Algorithm to find contextual nodes
◦ Cognitive Sciences
◦ Neural Networks
◦ Information Retrieval
 Associative, Semantic Networks
◦ Semantic Web
 Context Generation
25
26
Cricket
M S Dhoni Virat Kohli
Sachin
Tendulkar
Sports
Indian
Cricket
Indian
Cricketers
0.8 0.2
0.6
0.5
0.4
0.25
0.1
Activation Function
Determines the extent of
spreading
27
 No Decay – No Weighted Edge
• Result: Most generic categories ranked higher
 Decays over the hops of the activation
• 0.4, 0.6, 0.8
• Result: Same as above
28
29
Agriculture Science
Science
Education
Scientists
Main topic
classifications
Sports Health
Health
Care
Health
Economics
Level: 1
Main Topic Classification – 1
Technology – 2
Science – 2
Sports– 2
Business – 2
…
…
Technology Companies – 3
Scientists– 3
29
 Uneven distribution of nodes in the hierarchy
 Many-many for category-subcategory
relationships
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0
50000
100000
150000
200000
250000
300000
Hierarchical Level
NumberofNodes
30

31
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0
50000
100000
150000
200000
250000
300000
NumberofNodes
Hierarchical Level
31
32
1 2 3 4
32
 Nodes that intersect domains/subcategories activated
by diverse entities
33
Cricket
M S Dhoni Virat Kohli
Sachin
Tendulkar
Sports
Indian
Cricket
Indian
Cricketers3
3
5
5
Michael
Clarke
Shane
Watson
Australian
Cricket
Australian
Cricketers
2
2
33

3434
35
 Motivation
 Approach
Evaluation
 Conclusion & Future Work
36
 User Study Data
◦ 37 Users
◦ 31927 Tweets
37
• Hierarchical Interest Graph
– 111,535 Category
Interests.
– 3000 Categories/user
– Ranking Evaluation --
Top-50 Categories.
 How many relevant/irrelevant Hierarchical
Interests are retrieved at top-k ranks?
◦ Graded Precision
 How well are the retrieved relevant
Hierarchical Interests ranked at top-k?
◦ Mean Average Precision
 How early in the ranked Hierarchical Interests
can we find a relevant result?
◦ Mean Reciprocal Recall
38
39
Priority Intersect works the best
with
• 76% Mean Average Precision
• 98% Mean Reciprocal Recall
 How many of the categories inferred by the system
were not explicitly mentioned by the user in
tweets? (Semantic Web and Category:Semantic Web)
40
Priority Intersect at Top-10
• 52% of Categories were not mentioned in
tweets by user
• 65% of which were marked relevant
• 10% were marked May-be
 Mapped (String match) categories of
Wikipedia to Dmoz.
◦ ~141K categories mapped
 Compared all the category and sub-category
relationships of the mapped categories in the
hierarchy to manually created Dmoz.
◦ 87% precise (in hierarchy were also found in Dmoz)
41
 Motivation
 Approach
 Evaluation
Conclusion & Future
Work
42
 Hierarchical Interest Graph (Hierarchy representation of
user interests)
◦ With hierarchical levels of each interest to have flexibility for
personalizing and recommending based on its abstractness.
 We semantically enhanced user profiles of interests from
Twitter using Knowledge bases.
◦ Inferred abstract/hierarchical interests of Twitter users using
Wikipedia
◦ This can help reducing the data sparcity problem by inferring
relevant interests.
 The top-1 hierarchical-interest generated by the system
was correct for 36 out of 37 user-study participants.
◦ Mean Average Precision at Top-10 is 0.76
43
 Measuring impact of Hierarchical Interest
Graphs for recommendation of Movies/Music
◦ Datasets
 Movielens
 Lastfm
 Tuning the system to utilize the hierarchical
levels of interests for personalization and
recommendation
◦ Sports (most abstract interest)
◦ Baseball (specific interest)
44
45
Contact: Pavan Kapanipathi
Twitter:@pavankaps
Email: pavan@knoesis.org

Mais conteúdo relacionado

Mais procurados

Cognitive Models in Recommender Systems
Cognitive Models in Recommender SystemsCognitive Models in Recommender Systems
Cognitive Models in Recommender Systems
Christoph Trattner
 

Mais procurados (6)

Social Network Analysis (Part 1)
Social Network Analysis (Part 1)Social Network Analysis (Part 1)
Social Network Analysis (Part 1)
 
How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...How to conduct a social network analysis: A tool for empowering teams and wor...
How to conduct a social network analysis: A tool for empowering teams and wor...
 
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
 
Cognitive Models in Recommender Systems
Cognitive Models in Recommender SystemsCognitive Models in Recommender Systems
Cognitive Models in Recommender Systems
 
From Search to Predictions in Tagged Information Spaces
From Search to Predictions in Tagged Information SpacesFrom Search to Predictions in Tagged Information Spaces
From Search to Predictions in Tagged Information Spaces
 
Research Interests : Their Dynamics, Structures and Applications in Personali...
Research Interests : Their Dynamics, Structures and Applications in Personali...Research Interests : Their Dynamics, Structures and Applications in Personali...
Research Interests : Their Dynamics, Structures and Applications in Personali...
 

Destaque

Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Amit Sheth
 
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Artificial Intelligence Institute at UofSC
 
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Artificial Intelligence Institute at UofSC
 

Destaque (8)

ACM Web-Science 2014: Assisting Crisis Coordination Using Social Media
ACM Web-Science 2014: Assisting Crisis Coordination Using Social MediaACM Web-Science 2014: Assisting Crisis Coordination Using Social Media
ACM Web-Science 2014: Assisting Crisis Coordination Using Social Media
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and Applications
 
Whom to Coordinate With and How in Online Social Communities during Crisis Re...
Whom to Coordinate With and How in Online Social Communities during Crisis Re...Whom to Coordinate With and How in Online Social Communities during Crisis Re...
Whom to Coordinate With and How in Online Social Communities during Crisis Re...
 
U.S. Religious Landscape on Twitter
U.S. Religious Landscape on TwitterU.S. Religious Landscape on Twitter
U.S. Religious Landscape on Twitter
 
IEEE SocialCom 2015: Intent Classification of Social Media Text
IEEE SocialCom 2015: Intent Classification of Social Media TextIEEE SocialCom 2015: Intent Classification of Social Media Text
IEEE SocialCom 2015: Intent Classification of Social Media Text
 
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
Transforming Big Data into Smart Data for Smart Energy: Deriving Value via ha...
 
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
 
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
 

Semelhante a User Interests Identification From Twitter using Hierarchical Knowledge Base

The Personal Networks of Novice Librarian Researchers
The Personal Networks of Novice Librarian ResearchersThe Personal Networks of Novice Librarian Researchers
The Personal Networks of Novice Librarian Researchers
IRDL
 
Structural Balance Theory Based Recommendation for Social Service Portal
Structural Balance Theory Based Recommendation for Social Service PortalStructural Balance Theory Based Recommendation for Social Service Portal
Structural Balance Theory Based Recommendation for Social Service Portal
YogeshIJTSRD
 
Direct Project HIT Standards 10.27
Direct Project HIT Standards 10.27Direct Project HIT Standards 10.27
Direct Project HIT Standards 10.27
Brian Ahier
 

Semelhante a User Interests Identification From Twitter using Hierarchical Knowledge Base (20)

Twitter sentiment classifications 1
Twitter sentiment classifications 1Twitter sentiment classifications 1
Twitter sentiment classifications 1
 
Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019
 
Saner17 sharma
Saner17 sharmaSaner17 sharma
Saner17 sharma
 
Lecture_1_Intro.pdf
Lecture_1_Intro.pdfLecture_1_Intro.pdf
Lecture_1_Intro.pdf
 
IT for management
IT for managementIT for management
IT for management
 
Deploying Viva Topics
Deploying Viva TopicsDeploying Viva Topics
Deploying Viva Topics
 
The Personal Networks of Novice Librarian Researchers
The Personal Networks of Novice Librarian ResearchersThe Personal Networks of Novice Librarian Researchers
The Personal Networks of Novice Librarian Researchers
 
Management and analysis of social media data
Management and analysis of social media dataManagement and analysis of social media data
Management and analysis of social media data
 
Strengthening Network Practice Through Evaluation
Strengthening Network Practice Through EvaluationStrengthening Network Practice Through Evaluation
Strengthening Network Practice Through Evaluation
 
Intro to UOSM2012
Intro to UOSM2012Intro to UOSM2012
Intro to UOSM2012
 
#Edu14 Seminar on the State of Social Media in Higher Ed
#Edu14 Seminar on the State of Social Media in Higher Ed#Edu14 Seminar on the State of Social Media in Higher Ed
#Edu14 Seminar on the State of Social Media in Higher Ed
 
Large scale social recommender systems and their evaluation
Large scale social recommender systems and their evaluationLarge scale social recommender systems and their evaluation
Large scale social recommender systems and their evaluation
 
Data-X-Sparse-v2
Data-X-Sparse-v2Data-X-Sparse-v2
Data-X-Sparse-v2
 
Data-X-v3.1
Data-X-v3.1Data-X-v3.1
Data-X-v3.1
 
Fairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInFairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedIn
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TV
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TV
 
Social Network Analysis based on MOOC's (Massive Open Online Classes)
Social Network Analysis based on MOOC's (Massive Open Online Classes)Social Network Analysis based on MOOC's (Massive Open Online Classes)
Social Network Analysis based on MOOC's (Massive Open Online Classes)
 
Structural Balance Theory Based Recommendation for Social Service Portal
Structural Balance Theory Based Recommendation for Social Service PortalStructural Balance Theory Based Recommendation for Social Service Portal
Structural Balance Theory Based Recommendation for Social Service Portal
 
Direct Project HIT Standards 10.27
Direct Project HIT Standards 10.27Direct Project HIT Standards 10.27
Direct Project HIT Standards 10.27
 

Último

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Último (20)

On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 

User Interests Identification From Twitter using Hierarchical Knowledge Base

  • 1. Pavan Kapanipathi*, Prateek Jain^, Chitra Venkataramani^, Amit Sheth* *Kno.e.sis Center, Wright State University ^IBM TJ Watson Research Center 1 #eswc2014Kapanipathi
  • 2.  Motivation  Background  Approach  Evaluation  Conclusion & Future Work 2
  • 4.  Tapping into Social Networks to identify interests is not new (2006+). It works!! ◦ Google, Bing, Samsung TV etc.  Twitter Content ◦ 500M+ Users generating 500M+ tweets per day. ◦ Public and useful for research 4
  • 5.  Interests with lesser or no semantics ◦ Bag of Words [1] ◦ Bag of Concepts  Some Semantics ◦ Bag of Linked Entities with intentions of using Knowledge Bases. [2, 3] 5 1. Alan Mislove, Bimal Viswanath, Krishna P. Gummadi, and Peter Druschel. You Are Who You Know: Inferring User Profiles in Online Social Networks. WSDM ’10. 2. Fabian Abel, Qi Gao, Geert-Jan Houben, and Ke Tao. Analyzing User Modeling on Twitter for Personalized News Recommendations. UMAP ’11 3. Fabrizio Orlandi, John Breslin, and Alexandre Passant. Aggregated, Interoperable and Multi-domain User Profiles for the Social Web. I-SEMANTICS ’12.
  • 6. 6
  • 7.  How can Semantics/Knowledge Bases be utilized to infer interests? ◦ Extensive use of Knowledge Bases to infer user interests from Tweets is yet to be explored.  First we started with utilizing Hierarchical Relationships 7
  • 9.  Addressing Data Sparcity Problem ◦ Infer more interests of the users with lesser data.  Flexibility for Recommendations ◦ Recommend about Sports or Football  KB knows that Football is a sub-category of Sports ◦ Resource Description Framework and Semantic Web  RDF has lesser data online to recommend. 9
  • 10.  Motivation Approach  Evaluation  Conclusion & Future Work 10
  • 13.  Selecting an Ontology ◦ Available: Wikipedia, Dmoz, OpenCyc, Freebase ◦ Our framework can adapt to any ontology  Wikipedia ◦ Diverse Domains & Coverage ◦ Resemblance to a Taxonomy ◦ Extracted Structured Wikipedia – Dbpedia ◦ Existing entity recognition techniques (Explained further) 13
  • 14.  4.2 Million Articles  0.8 Million Wikipedia Categories  2.0 Million Category-Subcategory relationships  Challenges ◦ Since crowd-sourced – Noisy ◦ Not a hierarchy/taxonomy  It is a graph  It has cycles 14
  • 15.  Clean up -- Removed Wiki Admin Categories  Hierarchical Interest Graph needs a Base Hierarchy ◦ Shortest Path from the root node  Root Node: Category:Main Topic Classifications  Assumption – Hops to the root node determines the level of abstraction of the category. 15
  • 16. 16 Agriculture Science Science Education Scientists Main topic classifications Sports Health Health Care Health Economics Level: 1 Level: 2 Level: 3
  • 17.  Removing Links that does not concur to a hierarchy 17
  • 19.  Extracting Wikipedia concepts from Tweets  Interests Scoring 19 http://en.wikipedia.org/wiki/Semantic_search http://en.wikipedia.org/wiki/Ontology
  • 20. ◦ Issues relevant to entity extraction are handled by the web services  Stop words removal, URLs, Disambiguation etc. 20 Precision Recall F-measure Usability Rate Limit License Dbpedia Spotlight 20.1 47.5 28.3 Inhouse+Web Service N/A Apache 2.0 Text Razor 64.6 26.9 38.0 Web Service 500/day Zemanta 57.7 31.8 41.0 Web Service 10000/day *L. Derczynski, D. Maynard, N. Aswani, and K. Bontcheva. Microblog-genre noise and impact on semantic annotation accuracy. In Proceedings of the 24th ACM Conference on Hypertext and Social Media, HT ’13.
  • 21.  Scoring Wikipedia concepts 21
  • 24.  Result (Challenges) ◦ Infer more categories without context ◦ Equal weights regardless Interest Score ◦ Cannot rank categories of Interest for a user ◦ We use Spreading Activation 24 Cricket M S Dhoni Virat Kohli Sachin Tendulkar Sports Indian Cricket Indian Cricketers Honorary Members of the Order of Australia Order of Australia Awards Culture
  • 25.  Graph Algorithm to find contextual nodes ◦ Cognitive Sciences ◦ Neural Networks ◦ Information Retrieval  Associative, Semantic Networks ◦ Semantic Web  Context Generation 25
  • 26. 26 Cricket M S Dhoni Virat Kohli Sachin Tendulkar Sports Indian Cricket Indian Cricketers 0.8 0.2 0.6 0.5 0.4 0.25 0.1 Activation Function Determines the extent of spreading
  • 27. 27
  • 28.  No Decay – No Weighted Edge • Result: Most generic categories ranked higher  Decays over the hops of the activation • 0.4, 0.6, 0.8 • Result: Same as above 28
  • 29. 29 Agriculture Science Science Education Scientists Main topic classifications Sports Health Health Care Health Economics Level: 1 Main Topic Classification – 1 Technology – 2 Science – 2 Sports– 2 Business – 2 … … Technology Companies – 3 Scientists– 3 29
  • 30.  Uneven distribution of nodes in the hierarchy  Many-many for category-subcategory relationships 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 50000 100000 150000 200000 250000 300000 Hierarchical Level NumberofNodes 30
  • 31.  31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 50000 100000 150000 200000 250000 300000 NumberofNodes Hierarchical Level 31
  • 32. 32 1 2 3 4 32
  • 33.  Nodes that intersect domains/subcategories activated by diverse entities 33 Cricket M S Dhoni Virat Kohli Sachin Tendulkar Sports Indian Cricket Indian Cricketers3 3 5 5 Michael Clarke Shane Watson Australian Cricket Australian Cricketers 2 2 33
  • 35. 35
  • 36.  Motivation  Approach Evaluation  Conclusion & Future Work 36
  • 37.  User Study Data ◦ 37 Users ◦ 31927 Tweets 37 • Hierarchical Interest Graph – 111,535 Category Interests. – 3000 Categories/user – Ranking Evaluation -- Top-50 Categories.
  • 38.  How many relevant/irrelevant Hierarchical Interests are retrieved at top-k ranks? ◦ Graded Precision  How well are the retrieved relevant Hierarchical Interests ranked at top-k? ◦ Mean Average Precision  How early in the ranked Hierarchical Interests can we find a relevant result? ◦ Mean Reciprocal Recall 38
  • 39. 39 Priority Intersect works the best with • 76% Mean Average Precision • 98% Mean Reciprocal Recall
  • 40.  How many of the categories inferred by the system were not explicitly mentioned by the user in tweets? (Semantic Web and Category:Semantic Web) 40 Priority Intersect at Top-10 • 52% of Categories were not mentioned in tweets by user • 65% of which were marked relevant • 10% were marked May-be
  • 41.  Mapped (String match) categories of Wikipedia to Dmoz. ◦ ~141K categories mapped  Compared all the category and sub-category relationships of the mapped categories in the hierarchy to manually created Dmoz. ◦ 87% precise (in hierarchy were also found in Dmoz) 41
  • 42.  Motivation  Approach  Evaluation Conclusion & Future Work 42
  • 43.  Hierarchical Interest Graph (Hierarchy representation of user interests) ◦ With hierarchical levels of each interest to have flexibility for personalizing and recommending based on its abstractness.  We semantically enhanced user profiles of interests from Twitter using Knowledge bases. ◦ Inferred abstract/hierarchical interests of Twitter users using Wikipedia ◦ This can help reducing the data sparcity problem by inferring relevant interests.  The top-1 hierarchical-interest generated by the system was correct for 36 out of 37 user-study participants. ◦ Mean Average Precision at Top-10 is 0.76 43
  • 44.  Measuring impact of Hierarchical Interest Graphs for recommendation of Movies/Music ◦ Datasets  Movielens  Lastfm  Tuning the system to utilize the hierarchical levels of interests for personalization and recommendation ◦ Sports (most abstract interest) ◦ Baseball (specific interest) 44