SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
Web Usage Mining with Semantic Analysis
Laura Hollink, VU University Amsterdam
Peter Mika, Yahoo! Labs Barcelona
Roi Blanco, Yahoo! Labs Barcelona
Analysis of web user behavior
What are typical use cases? Are these carried out in a particular order?
Which use cases are not satisfied? And to which other sites do users
go?
Analysis of web user behavior
What are typical use cases? Are these carried out in a particular order?
Which use cases are not satisfied? And to which other sites do users
go?
oakland'as'bradd'pi-'movie'''moneyball'''movies.yahoo.com oakland'as'''wikipedia.org!
captain'america'''movies.yahoo.com moneyball'trailer'''movies.yahoo.com'
money'''moneyball'movies.yahoo.com'
moneyball'''movies.yahoo.com''movies.yahoo.com en.wikipedia.org'''movies.yahoo.com''peter'brand'''peter
nymag.com'''moneyball'the'movie'''www.imdb.com'
moneyball'trailer'movies.yahoo.com''moneyball'trailer''
brad'pi-''brad'pi-'moneyball''brad'pi-'moneyball'movie'brad'pi-'moneyball''brad'pi-'moneyball'oscar'''www.imdb.co
relay'for'life'calvert'ocunty www.relayforlife.org'trailer'for'moneyball'''movies.yahoo.com 'moneyball.movie
moneyball'en.wikipedia.org 'movies.yahoo.com map'of'africa''www.africaguide.com'
money'ball'movie'''www.imdb.com money'ball'movie'trailer''moneyball.movie-trailer.com''
brad'pi-'new''www.zimbio.com www.usaweekend.com www.ivillage.com www.ivillage.com'brad'pi-'news'
news.search.yahoo.com moneyball'trailer''moneyball'trailer'www.imdb.com''www.imdb.com!
Transaction logs: sessions of queries and clicks
Analysis of web user behavior
oakland'as'bradd'pi-'movie'''moneyball'''movies.yahoo.com oakland'as'''wikipedia.org!
captain'america'''movies.yahoo.com moneyball'trailer'''movies.yahoo.com'
money'''moneyball'movies.yahoo.com'
moneyball'''movies.yahoo.com''movies.yahoo.com en.wikipedia.org'''movies.yahoo.com''peter'brand'''peter
nymag.com'''moneyball'the'movie'''www.imdb.com'
moneyball'trailer'movies.yahoo.com''moneyball'trailer''
brad'pi-''brad'pi-'moneyball''brad'pi-'moneyball'movie'brad'pi-'moneyball''brad'pi-'moneyball'oscar'''www.imdb.co
relay'for'life'calvert'ocunty www.relayforlife.org'trailer'for'moneyball'''movies.yahoo.com 'moneyball.movie
moneyball'en.wikipedia.org 'movies.yahoo.com map'of'africa''www.africaguide.com'
money'ball'movie'''www.imdb.com money'ball'movie'trailer''moneyball.movie-trailer.com''
brad'pi-'new''www.zimbio.com www.usaweekend.com www.ivillage.com www.ivillage.com'brad'pi-'news'
news.search.yahoo.com moneyball'trailer''moneyball'trailer'www.imdb.com''www.imdb.com!
Transaction logs: sessions of queries and clicks
Are these use cases typical for all movies? Recent movies? Only for
Moneyball?
Why are these questions difficult to answer?
Sparsity of the event space
‣ 64% percent of queries are unique within a year
‣ even the most frequent patterns have extremely low support
To illustrate: top 12 most frequent sessions observed in our data:
Tasks
Question 1: what are typical use cases?
‣Task 1: find sequences of events in the data that are more
frequent (have a higher support) than a threshold.
Question 2: what use cases are not satisfied?
‣Task 2: learn to predict website abandonment from
queries and clicks.
Approach
'oakland'as'bradd'pi-'movie'''moneyball'''movies.yahoo.com oakland'as'''wikipedia.org!
Applied to the
movie domain
Connect queries to entities in the linked open data cloud and use
properties of these entities to generalize and categorize queries.
Data processing and linking steps
1.link queries to entities
2.select types of entities (classes)
3.detect modifier words (download, trailer, cast, date, etc.)
4.identify navigational queries
5.identify ‘loosing’ queries.
'oakland'as'bradd'pi-'movie'''moneyball'''movies.yahoo.com oakland'as'''wikipedia.org!
1. Linking queries to entities in the LOD cloud
• We link one entity to each query.
• The intent of about 40% of unique Web queries is to find a particular entity
[Pound, WWW2008].
• We link to Freebase (has a lot of movie related info) and DBpedia (Wikipedia is
widely used)
2. Select one type per entity
• We use the Freebase API to get the semantic “types” of
each query URI
• Freebase ‘Notable types API’ is not official and not
documented.
• For repeatability and transparency, we have created our
own heuristics to select one type for each entity:
1. no internal or administrative types,
2.prefer established domains (‘Commons’) over user defined schemas
(’Bases’)
3.aggregate specific types into more general types
a)subtypes of location -> location
b)subtypes of award winners and nominees -> award_winner_nonimee
c)prefer movie related types over other types: film, actor,
artist, tv_program, tv_actor and location (order of decreasing
preference).
entity
TypeType
Type Type
Type
Type
3. Detect modifier words in queries
Top 100 most frequent words that appear in the query log before or after
entity names [Mika ISWC2009, Pantel WWW2012].
movie, movies, theater, cast, quotes, free, theaters, watch, 2011, new, tv,
show, dvd, online, sex, video, cinema, trailer, list, theatre . . .
4. Identifying navigational queries
• A navigational query is a query entered with the intention of navigating to a
particular website.
• A common heuristic is to consider navigational queries where the query
matches the domain name of a clicked result.
• “official homepage” is value of dbpedia:homepage, dbpedia:url, and
foaf:homepage.
netflix login www.netflix.com
banana www.bananas.org
European Parliament europarl.europa.eu
5 Identify ‘loosing’ queries
• A ‘loosing’ query is the query that leads a user to abandon a service in favor
of another service.
• Common definition: A user repeats the same query and clicks on another
result in the list.
• Our broader, semantic definition:
Evaluation
1.Linking to entities and types
2.Detection of frequent usage patterns
3.Prediction of website abandonment
Applied to the movie domain
• sample of server logs of Yahoo! Search in the US
from June, 2011, split into sessions.
• Only sessions that contain at least one visit to any
of 16 popular movie sites4.
• 1.7 million sessions, containing over 5.8 million
queries and over 6.8 million clicks.
Evaluation of links to entities and types
• Compare manually created <query, entity> and <entity, type> pairs to
automatically created links.
• 2 samples: the 50 most frequent queries and 50 random queries.
Examples:
• Ambiguous query: “Green Lantern” - the movie or the fictional character?
• Wrong type: Oil peak is a serious game subject?
Evaluation of links to entities and types
Queries Entities Types
Frequencyofoccurrence
Frequencyofoccurrence
Frequencyofoccurrence
Frequent usage patterns I
• Freebase:release_date property of entities.
Recent movies Older movies
Frequent usage patterns II
• Sequences of consecutive query types.
Frequent usage patterns III
• A comparison of
websites.
• most frequent query
types that lead to a click
on a website.
/film
/film/actor
/tv_program
/people/person
/book/book
ional_universe/fictional_character
/music/artist
/tv/tv_actor
/location
/film/film_series
Website 1
proportionofqueriesthatleadtoaclickonthewebsite
0.0
0.1
0.2
0.3
0.4
0.5
0.6
/film
/location
/book/book
/film/actor
/business/employer
/fictional_universe/work_of_fiction
ional_universe/fictional_character
/tv_program
/architecture/building_function
/film/film_series
Website 2
proportionofqueriesthatleadtoaclickonthewebsite
0.0
0.1
0.2
0.3
0.4
0.5
0.6
/location
/business/employer
/film
/film/actor
/organization/organization
/architecture/building_function
/people/person
/tv_program
/tv/tv_network
/internet/website_category
Website 3
proportionofqueriesthatleadtoaclickonthewebsite
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Proportionofqueries
Proportionofqueries
Website BWebsite A
Predicting website abandonment
• 3 Classification Tasks:
Given a (part of a) session in which a user is lost/gained, predict...
1...whether a user will be gained for a given website.
2...given that the session includes a given website, whether this website is in
the loosing or gaining position.
3...given that the session includes two given websites, which one is in the
gaining position.
•Gradient Boosted Decision Trees.
Discussion and future work
• Mining patterns of entire queries gives problems with sparsity of data
• We interpret the structure and semantics of the queries, using openly
available, up-to-date information on the Web.
• give a “semantic” definition of navigational and ‘loosing’ queries
• find patterns of user behavior
• predict website abandonment
• This is the beginning:
• Use more properties of entities, more features.
• Detect more complex patterns.
• Explore other linked open datasets.
Thank you!
Questions?

Mais conteúdo relacionado

Destaque

Duneska gómez
Duneska gómezDuneska gómez
Duneska gómezDUN GOMEZ
 
Alibaba: The Figures
Alibaba: The FiguresAlibaba: The Figures
Alibaba: The FiguresStartup China
 
Mexico2008 Photo Album2
Mexico2008 Photo Album2Mexico2008 Photo Album2
Mexico2008 Photo Album2Barry Fisher
 
El Primado de Pedro
El Primado de PedroEl Primado de Pedro
El Primado de PedroMiguel Angel
 
Creating professional learning community schoolloop112
Creating professional learning community schoolloop112Creating professional learning community schoolloop112
Creating professional learning community schoolloop112marcelo leal
 
Defensa acusación 26/04
Defensa acusación 26/04Defensa acusación 26/04
Defensa acusación 26/04cee_info_2012
 
Gulmohar project brochure
Gulmohar project brochureGulmohar project brochure
Gulmohar project brochureAshoka Realty
 
Plan nacional de desarrollo – evolucion de indicadores
Plan nacional de desarrollo – evolucion de indicadoresPlan nacional de desarrollo – evolucion de indicadores
Plan nacional de desarrollo – evolucion de indicadoresdavid_9015
 
Proyectos Estudiantiles DI - Bases
Proyectos Estudiantiles DI - BasesProyectos Estudiantiles DI - Bases
Proyectos Estudiantiles DI - Basescee_info_2012
 
Bad Grammar Tattoos
Bad Grammar TattoosBad Grammar Tattoos
Bad Grammar Tattoosnoeldrew
 
Simplifying the Complex: Serving Data from Pipeline Data Models
Simplifying the Complex: Serving Data from Pipeline Data ModelsSimplifying the Complex: Serving Data from Pipeline Data Models
Simplifying the Complex: Serving Data from Pipeline Data ModelsSafe Software
 

Destaque (18)

Duneska gómez
Duneska gómezDuneska gómez
Duneska gómez
 
Alibaba: The Figures
Alibaba: The FiguresAlibaba: The Figures
Alibaba: The Figures
 
Pisici
PisiciPisici
Pisici
 
Mexico2008 Photo Album2
Mexico2008 Photo Album2Mexico2008 Photo Album2
Mexico2008 Photo Album2
 
El Primado de Pedro
El Primado de PedroEl Primado de Pedro
El Primado de Pedro
 
Creating professional learning community schoolloop112
Creating professional learning community schoolloop112Creating professional learning community schoolloop112
Creating professional learning community schoolloop112
 
Defensa acusación 26/04
Defensa acusación 26/04Defensa acusación 26/04
Defensa acusación 26/04
 
Fitxa sessió
Fitxa sessióFitxa sessió
Fitxa sessió
 
Gulmohar project brochure
Gulmohar project brochureGulmohar project brochure
Gulmohar project brochure
 
Plan nacional de desarrollo – evolucion de indicadores
Plan nacional de desarrollo – evolucion de indicadoresPlan nacional de desarrollo – evolucion de indicadores
Plan nacional de desarrollo – evolucion de indicadores
 
Proyectos Estudiantiles DI - Bases
Proyectos Estudiantiles DI - BasesProyectos Estudiantiles DI - Bases
Proyectos Estudiantiles DI - Bases
 
Bad Grammar Tattoos
Bad Grammar TattoosBad Grammar Tattoos
Bad Grammar Tattoos
 
Garuda Indonesia (GA88)
Garuda Indonesia (GA88)Garuda Indonesia (GA88)
Garuda Indonesia (GA88)
 
Petrofísica de carbonatos do nordeste brasileiro
Petrofísica de carbonatos do nordeste brasileiroPetrofísica de carbonatos do nordeste brasileiro
Petrofísica de carbonatos do nordeste brasileiro
 
Venice_la_nuite
Venice_la_nuiteVenice_la_nuite
Venice_la_nuite
 
Simplifying the Complex: Serving Data from Pipeline Data Models
Simplifying the Complex: Serving Data from Pipeline Data ModelsSimplifying the Complex: Serving Data from Pipeline Data Models
Simplifying the Complex: Serving Data from Pipeline Data Models
 
CITd
CITdCITd
CITd
 
Presentación de caso clínico
Presentación de caso clínicoPresentación de caso clínico
Presentación de caso clínico
 

Semelhante a WWW2013: Web Usage Mining with Semantic Analysis

Internet Movie Database (IMDB) Presentation
Internet Movie Database (IMDB) PresentationInternet Movie Database (IMDB) Presentation
Internet Movie Database (IMDB) Presentationlyvette24
 
Genetic Malware
Genetic MalwareGenetic Malware
Genetic MalwareOkta
 
Lessons Learnt From Working With Rails
Lessons Learnt From Working With RailsLessons Learnt From Working With Rails
Lessons Learnt From Working With Railsmartinbtt
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in PracticePeter Mika
 
Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsPeter Mika
 
CBMI 2013 Presentation: User Intentions in Multimedia
CBMI 2013 Presentation: User Intentions in MultimediaCBMI 2013 Presentation: User Intentions in Multimedia
CBMI 2013 Presentation: User Intentions in Multimediadermotte
 
WTF is Semantic Web?
WTF is Semantic Web?WTF is Semantic Web?
WTF is Semantic Web?milesw
 
Tom Critchlow - Data Feed SEO & Advanced Site Architecture
Tom Critchlow - Data Feed SEO & Advanced Site ArchitectureTom Critchlow - Data Feed SEO & Advanced Site Architecture
Tom Critchlow - Data Feed SEO & Advanced Site Architectureauexpo Conference
 
Anly 500-presentation
Anly 500-presentationAnly 500-presentation
Anly 500-presentationFangyaTan
 
Web query expansion based on association rules mining with e hownet and googl...
Web query expansion based on association rules mining with e hownet and googl...Web query expansion based on association rules mining with e hownet and googl...
Web query expansion based on association rules mining with e hownet and googl...Paul Yang
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
 
项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通 项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通 topgeek
 
Immersive Recommendation
Immersive RecommendationImmersive Recommendation
Immersive Recommendation承剛 謝
 
Data science for advanced dummies
Data science for advanced dummiesData science for advanced dummies
Data science for advanced dummiesSaurav Chakravorty
 
Introduction to Information Architecture & Design - SVA Workshop 02/15/14
Introduction to Information Architecture & Design - SVA Workshop 02/15/14Introduction to Information Architecture & Design - SVA Workshop 02/15/14
Introduction to Information Architecture & Design - SVA Workshop 02/15/14Robert Stribley
 

Semelhante a WWW2013: Web Usage Mining with Semantic Analysis (20)

Internet Movie Database (IMDB) Presentation
Internet Movie Database (IMDB) PresentationInternet Movie Database (IMDB) Presentation
Internet Movie Database (IMDB) Presentation
 
Genetic Malware
Genetic MalwareGenetic Malware
Genetic Malware
 
Genetic Malware
Genetic MalwareGenetic Malware
Genetic Malware
 
Lessons Learnt From Working With Rails
Lessons Learnt From Working With RailsLessons Learnt From Working With Rails
Lessons Learnt From Working With Rails
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in Practice
 
Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistants
 
Collab filtering-tutorial
Collab filtering-tutorialCollab filtering-tutorial
Collab filtering-tutorial
 
CBMI 2013 Presentation: User Intentions in Multimedia
CBMI 2013 Presentation: User Intentions in MultimediaCBMI 2013 Presentation: User Intentions in Multimedia
CBMI 2013 Presentation: User Intentions in Multimedia
 
WTF is Semantic Web?
WTF is Semantic Web?WTF is Semantic Web?
WTF is Semantic Web?
 
Tom Critchlow - Data Feed SEO & Advanced Site Architecture
Tom Critchlow - Data Feed SEO & Advanced Site ArchitectureTom Critchlow - Data Feed SEO & Advanced Site Architecture
Tom Critchlow - Data Feed SEO & Advanced Site Architecture
 
Anly 500-presentation
Anly 500-presentationAnly 500-presentation
Anly 500-presentation
 
Web query expansion based on association rules mining with e hownet and googl...
Web query expansion based on association rules mining with e hownet and googl...Web query expansion based on association rules mining with e hownet and googl...
Web query expansion based on association rules mining with e hownet and googl...
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通 项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通
 
Google Hacking 101
Google Hacking 101Google Hacking 101
Google Hacking 101
 
Immersive Recommendation
Immersive RecommendationImmersive Recommendation
Immersive Recommendation
 
Open Source Intelligence
Open Source IntelligenceOpen Source Intelligence
Open Source Intelligence
 
Mashups
MashupsMashups
Mashups
 
Data science for advanced dummies
Data science for advanced dummiesData science for advanced dummies
Data science for advanced dummies
 
Introduction to Information Architecture & Design - SVA Workshop 02/15/14
Introduction to Information Architecture & Design - SVA Workshop 02/15/14Introduction to Information Architecture & Design - SVA Workshop 02/15/14
Introduction to Information Architecture & Design - SVA Workshop 02/15/14
 

Mais de Laura Hollink

Creating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU ParliamentCreating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU ParliamentLaura Hollink
 
Enriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept driftEnriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept driftLaura Hollink
 
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social SciencesGuest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social SciencesLaura Hollink
 
Images in Online News: demo scenario
Images in Online News: demo scenarioImages in Online News: demo scenario
Images in Online News: demo scenarioLaura Hollink
 
Connecting political data to media data
Connecting political data to media dataConnecting political data to media data
Connecting political data to media dataLaura Hollink
 
Talk of Europe: Linked data of the European Parliament
Talk of Europe:  Linked data of the European ParliamentTalk of Europe:  Linked data of the European Parliament
Talk of Europe: Linked data of the European ParliamentLaura Hollink
 
Presentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH projectPresentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH projectLaura Hollink
 
Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015Laura Hollink
 
Connecting political data to media data
Connecting political data to media dataConnecting political data to media data
Connecting political data to media dataLaura Hollink
 
Bringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic WebBringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic WebLaura Hollink
 

Mais de Laura Hollink (11)

Creating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU ParliamentCreating and Analysing Linked Open Data for the EU Parliament
Creating and Analysing Linked Open Data for the EU Parliament
 
Enriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept driftEnriching Linked Open Data with distributional semantics to study concept drift
Enriching Linked Open Data with distributional semantics to study concept drift
 
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social SciencesGuest Lecture: Linked Open Data for the Humanities and Social Sciences
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Images in Online News: demo scenario
Images in Online News: demo scenarioImages in Online News: demo scenario
Images in Online News: demo scenario
 
Connecting political data to media data
Connecting political data to media dataConnecting political data to media data
Connecting political data to media data
 
Talk of Europe: Linked data of the European Parliament
Talk of Europe:  Linked data of the European ParliamentTalk of Europe:  Linked data of the European Parliament
Talk of Europe: Linked data of the European Parliament
 
Presentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH projectPresentation at the final meeting of the MuNCH project
Presentation at the final meeting of the MuNCH project
 
Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015Talk of Europe @ DHBenelux2015
Talk of Europe @ DHBenelux2015
 
Connecting political data to media data
Connecting political data to media dataConnecting political data to media data
Connecting political data to media data
 
Bringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic WebBringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic Web
 

Último

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 

Último (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 

WWW2013: Web Usage Mining with Semantic Analysis

  • 1. Web Usage Mining with Semantic Analysis Laura Hollink, VU University Amsterdam Peter Mika, Yahoo! Labs Barcelona Roi Blanco, Yahoo! Labs Barcelona
  • 2. Analysis of web user behavior What are typical use cases? Are these carried out in a particular order? Which use cases are not satisfied? And to which other sites do users go?
  • 3. Analysis of web user behavior What are typical use cases? Are these carried out in a particular order? Which use cases are not satisfied? And to which other sites do users go? oakland'as'bradd'pi-'movie'''moneyball'''movies.yahoo.com oakland'as'''wikipedia.org! captain'america'''movies.yahoo.com moneyball'trailer'''movies.yahoo.com' money'''moneyball'movies.yahoo.com' moneyball'''movies.yahoo.com''movies.yahoo.com en.wikipedia.org'''movies.yahoo.com''peter'brand'''peter nymag.com'''moneyball'the'movie'''www.imdb.com' moneyball'trailer'movies.yahoo.com''moneyball'trailer'' brad'pi-''brad'pi-'moneyball''brad'pi-'moneyball'movie'brad'pi-'moneyball''brad'pi-'moneyball'oscar'''www.imdb.co relay'for'life'calvert'ocunty www.relayforlife.org'trailer'for'moneyball'''movies.yahoo.com 'moneyball.movie moneyball'en.wikipedia.org 'movies.yahoo.com map'of'africa''www.africaguide.com' money'ball'movie'''www.imdb.com money'ball'movie'trailer''moneyball.movie-trailer.com'' brad'pi-'new''www.zimbio.com www.usaweekend.com www.ivillage.com www.ivillage.com'brad'pi-'news' news.search.yahoo.com moneyball'trailer''moneyball'trailer'www.imdb.com''www.imdb.com! Transaction logs: sessions of queries and clicks
  • 4. Analysis of web user behavior oakland'as'bradd'pi-'movie'''moneyball'''movies.yahoo.com oakland'as'''wikipedia.org! captain'america'''movies.yahoo.com moneyball'trailer'''movies.yahoo.com' money'''moneyball'movies.yahoo.com' moneyball'''movies.yahoo.com''movies.yahoo.com en.wikipedia.org'''movies.yahoo.com''peter'brand'''peter nymag.com'''moneyball'the'movie'''www.imdb.com' moneyball'trailer'movies.yahoo.com''moneyball'trailer'' brad'pi-''brad'pi-'moneyball''brad'pi-'moneyball'movie'brad'pi-'moneyball''brad'pi-'moneyball'oscar'''www.imdb.co relay'for'life'calvert'ocunty www.relayforlife.org'trailer'for'moneyball'''movies.yahoo.com 'moneyball.movie moneyball'en.wikipedia.org 'movies.yahoo.com map'of'africa''www.africaguide.com' money'ball'movie'''www.imdb.com money'ball'movie'trailer''moneyball.movie-trailer.com'' brad'pi-'new''www.zimbio.com www.usaweekend.com www.ivillage.com www.ivillage.com'brad'pi-'news' news.search.yahoo.com moneyball'trailer''moneyball'trailer'www.imdb.com''www.imdb.com! Transaction logs: sessions of queries and clicks Are these use cases typical for all movies? Recent movies? Only for Moneyball?
  • 5. Why are these questions difficult to answer? Sparsity of the event space ‣ 64% percent of queries are unique within a year ‣ even the most frequent patterns have extremely low support To illustrate: top 12 most frequent sessions observed in our data:
  • 6. Tasks Question 1: what are typical use cases? ‣Task 1: find sequences of events in the data that are more frequent (have a higher support) than a threshold. Question 2: what use cases are not satisfied? ‣Task 2: learn to predict website abandonment from queries and clicks.
  • 7. Approach 'oakland'as'bradd'pi-'movie'''moneyball'''movies.yahoo.com oakland'as'''wikipedia.org! Applied to the movie domain Connect queries to entities in the linked open data cloud and use properties of these entities to generalize and categorize queries.
  • 8. Data processing and linking steps 1.link queries to entities 2.select types of entities (classes) 3.detect modifier words (download, trailer, cast, date, etc.) 4.identify navigational queries 5.identify ‘loosing’ queries. 'oakland'as'bradd'pi-'movie'''moneyball'''movies.yahoo.com oakland'as'''wikipedia.org!
  • 9. 1. Linking queries to entities in the LOD cloud • We link one entity to each query. • The intent of about 40% of unique Web queries is to find a particular entity [Pound, WWW2008]. • We link to Freebase (has a lot of movie related info) and DBpedia (Wikipedia is widely used)
  • 10. 2. Select one type per entity • We use the Freebase API to get the semantic “types” of each query URI • Freebase ‘Notable types API’ is not official and not documented. • For repeatability and transparency, we have created our own heuristics to select one type for each entity: 1. no internal or administrative types, 2.prefer established domains (‘Commons’) over user defined schemas (’Bases’) 3.aggregate specific types into more general types a)subtypes of location -> location b)subtypes of award winners and nominees -> award_winner_nonimee c)prefer movie related types over other types: film, actor, artist, tv_program, tv_actor and location (order of decreasing preference). entity TypeType Type Type Type Type
  • 11. 3. Detect modifier words in queries Top 100 most frequent words that appear in the query log before or after entity names [Mika ISWC2009, Pantel WWW2012]. movie, movies, theater, cast, quotes, free, theaters, watch, 2011, new, tv, show, dvd, online, sex, video, cinema, trailer, list, theatre . . .
  • 12. 4. Identifying navigational queries • A navigational query is a query entered with the intention of navigating to a particular website. • A common heuristic is to consider navigational queries where the query matches the domain name of a clicked result. • “official homepage” is value of dbpedia:homepage, dbpedia:url, and foaf:homepage. netflix login www.netflix.com banana www.bananas.org European Parliament europarl.europa.eu
  • 13. 5 Identify ‘loosing’ queries • A ‘loosing’ query is the query that leads a user to abandon a service in favor of another service. • Common definition: A user repeats the same query and clicks on another result in the list. • Our broader, semantic definition:
  • 14. Evaluation 1.Linking to entities and types 2.Detection of frequent usage patterns 3.Prediction of website abandonment Applied to the movie domain • sample of server logs of Yahoo! Search in the US from June, 2011, split into sessions. • Only sessions that contain at least one visit to any of 16 popular movie sites4. • 1.7 million sessions, containing over 5.8 million queries and over 6.8 million clicks.
  • 15. Evaluation of links to entities and types • Compare manually created <query, entity> and <entity, type> pairs to automatically created links. • 2 samples: the 50 most frequent queries and 50 random queries. Examples: • Ambiguous query: “Green Lantern” - the movie or the fictional character? • Wrong type: Oil peak is a serious game subject?
  • 16. Evaluation of links to entities and types Queries Entities Types Frequencyofoccurrence Frequencyofoccurrence Frequencyofoccurrence
  • 17. Frequent usage patterns I • Freebase:release_date property of entities. Recent movies Older movies
  • 18. Frequent usage patterns II • Sequences of consecutive query types.
  • 19. Frequent usage patterns III • A comparison of websites. • most frequent query types that lead to a click on a website. /film /film/actor /tv_program /people/person /book/book ional_universe/fictional_character /music/artist /tv/tv_actor /location /film/film_series Website 1 proportionofqueriesthatleadtoaclickonthewebsite 0.0 0.1 0.2 0.3 0.4 0.5 0.6 /film /location /book/book /film/actor /business/employer /fictional_universe/work_of_fiction ional_universe/fictional_character /tv_program /architecture/building_function /film/film_series Website 2 proportionofqueriesthatleadtoaclickonthewebsite 0.0 0.1 0.2 0.3 0.4 0.5 0.6 /location /business/employer /film /film/actor /organization/organization /architecture/building_function /people/person /tv_program /tv/tv_network /internet/website_category Website 3 proportionofqueriesthatleadtoaclickonthewebsite 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Proportionofqueries Proportionofqueries Website BWebsite A
  • 20. Predicting website abandonment • 3 Classification Tasks: Given a (part of a) session in which a user is lost/gained, predict... 1...whether a user will be gained for a given website. 2...given that the session includes a given website, whether this website is in the loosing or gaining position. 3...given that the session includes two given websites, which one is in the gaining position. •Gradient Boosted Decision Trees.
  • 21. Discussion and future work • Mining patterns of entire queries gives problems with sparsity of data • We interpret the structure and semantics of the queries, using openly available, up-to-date information on the Web. • give a “semantic” definition of navigational and ‘loosing’ queries • find patterns of user behavior • predict website abandonment • This is the beginning: • Use more properties of entities, more features. • Detect more complex patterns. • Explore other linked open datasets.