SlideShare uma empresa Scribd logo
1 de 30
Improving Semantic Search Using
      Query Log Analysis

            Khadija Elbedweihy, Stuart N. Wrigley and Fabio Ciravegna
                                                      OAK Research Group,
                                           Department of Computer Science,
                                                  University of Sheffield, UK
Outline

• Introduction
• Semantic Query Logs Analysis
  - Query-Concepts Model
  - Concepts-Predicates Model
  - Instance-Types Model
• Results Augmentation
• Data Visualisation
INTRODUCTION
Motivation

• Little work on results returned (answers) and
  presentation style.
   – Users want direct answers augmented with more
     information for richer experience1
   – Users want more user-friendly and attractive results
     presentation format1

• Semantic query logs: logs of queries issued to repositories
  containing RDF data.


1. See our paper from this morning’s IWEST 2012 workshop
Related Work
Semantic query logs analysis:
• Moller et al. identified patterns of Linked Data usage with
  respect to different types of agents.

• Arias et al. analysed the structure of the SPARQL queries
  to identify most frequent language elements.

• Luczak-Rösch et al. analysed query logs to detect errors
  and weaknesses in LD ontologies and support their
  maintenance.
Related Work (cont’d)

How our work is different:
Analyze semantic query logs to produce models capturing
different patterns of information needs on Linked Data:

 Concepts used together in a query: query-concepts model
 Predicate used with a concept: concept-predicates model
 Concepts used as types of a LD entity: instance-types model

The models make use of the “collaborative knowledge”
inherent in the logs to enhance the search process.
SEMANTIC QUERY LOG ANALYSIS
Extraction
• Query logs entries follow the Combined Log Format (CLF):




                                                        Extract SPARQL query


   SELECT DISTINCT ?genre, ?instrument WHERE
   {
       <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>.
       <…dbpedia.org…/Ringo_Starr> dbpedia:genre ?genre.
       <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument.
   }
Analysis
   SELECT DISTINCT ?genre, ?instrument WHERE
   {
       <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>.
       <…dbpedia.org…/Ringo_Starr> dbpedia:genre ?genre.
       <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument.
   }


• For each bound resource (subject or object) ->
   query endpoint for the type of the resource

              http://dbpedia.org/resource/Ringo_Starr


       type
                        http://dbpedia.org/ontology/MusicalArtist
Query-Concepts Model
   SELECT DISTINCT ?genre, ?instrument WHERE

   { <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>.
     <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument. }



1) Retrieve types of resources in the query:
   Ringo_Starr type dbpedia-owl:MusicalArtist, umbel:MusicalPerformer
   The_Beatles type dbpedia-owl:Band, schema:MusicGroup


2) Increment the co-occurrence of each concept in the first list
   with each concept in the second:

   MusicalArtist Band       MusicalPerformer MusicGroup

MusicalArtist MusicGroup       MusicalPerformer    Band
Concept-Predicates Model
    SELECT DISTINCT ?genre, ?instrument WHERE

    { <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>.
       <…dbpedia.org…/Ringo_Starr> dbpedia:genre ?genre.
       <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument. }


1) Retrieve types of resources used as subjects in the query:
    Ringo_Starr type dbpedia-owl:MusicalArtist, umbel:MusicalPerformer


2) Identify bound predicates (dbpedia:genre, dbpedia:instrument)

3) Increment the co-occurrence of each type with the predicate used in
    the same triple pattern:

MusicalPerformer genre        MusicalPerformer instrument

 MusicalArtist genre       MusicalArtist instrument
Instance-Types Model
   SELECT DISTINCT ?genre, ?instrument WHERE

   { <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>.
     <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument. }



1) Retrieve types of resources in the query:
   Ringo_Starr type dbpedia-owl:MusicalArtist, umbel:MusicalPerformer
   The_Beatles type dbpedia-owl:Band, schema:MusicGroup


2) Increment the co-occurrence of concepts found as types for the
   same instance:

             MusicalArtist MusicalPerformer

                Band      MusicGroup
RESULT AUGMENTATION
Dataset
• Two sets of DBpedia query logs made available at the
  USEWOD2011 and USEWOD2012 workshops.

• The logs contained around 5 million queries issued to
  DBpedia over a time period spanning almost 2 years

                                            USEWOD2012   USEWOD2011
   Number of analyzed queries               8866028      4951803
   Number of unique triple patterns         4095011      2641098
   Number of unique bound triple patterns   3619216      2571662
Results Enhancement
• Google, Yahoo!, Bing, etc. enhance search
  results using structured data




• FalconS and VisiNav return extra information together
  with each entity in the answers (e.g. type, label)

• Evaluation of Semantic Search showed that augmenting
  answers with extra information provides a richer user
  experience2.
2. See our paper from this morning’s IWEST 2012 workshop
FalconS Results
Query: `population of New York city’




• Information chosen depend on manually (randomly)
  predefined set.
Motivation for proposed approach
• Utilizing query logs as a source of collaborative knowledge
  able to capture implicit associations between Linked Data
  entities and properties.

• Use this to select which information to show the user.

• Two recent studies3 analyzed semantic query logs and
  observed that a class of entities is usually queried with
  similar relations and concepts.


 3. Luczak-Rösch et al. ; Elbedweihy et al.
Two Related Types of Result Augmentation
1. Additional result-related information.
  – More details about each result item
  – Provides better understanding of the answer.


2. Additional query-related information.
  – More results related to the query entities
  – Assists users in discovering useful findings
    (serendipity)
Return additional result-related information
Steps
1) For each result item, find types of instance.

1) Most frequently queried predicates associated with them
   are extracted from the concept-predicates model.

2) Generate queries with each pair (instance, predicate).
     e.g. (<…dbpedia.org…/Ringo_Starr> , genre)

3) Show aggregated results to the user.
Return additional result-related information
• MusicalArtist-> genre, associatedBand, occupation, instrument,
  birthDate, birthPlace, hometown, prop:yearsActive, foaf:surname,
  prop:associatedActs, …

Query: “Who played drums for the Beatles?”

Result: Ringo Starr
  Pop music, Rock music (genre)
  Keyboard, Drum,Acousticguitar(instrument)
  The Beatles, Plastic Ono Band, Rory Storm,(assoc.Band)
Return additional query-related information
Steps
1) Extract all concepts from query.

2) For any instances, find their types.

3) For each query concept, find most frequently occurring
   concepts from the query-concepts model.

4) For each related concept, query for instances that have
   relation with the originating instance.

5) Show aggregated results to the user.
Return additional query-related information
• City-> Book, Person, Country, Organisation, SportsTeam, MusicGroup,
  Film, RadioStation, River, University, SoccerPlayer, Hospital, ...


Query: “Where is the University of Sheffield located?”

Result: Sheffield,UK
  NickClegg,CliveBetts, DavidBlunkett(Person)
  SheffieldUnitedF SheffieldWednesday (SportsT
                    .C.,                             eam)
  Hallam FM,RealRadio, BBCRadioSheffield (RadioStn.)
  JessopHosp.,NorthernGeneral, RoyalHallamshire(Hospital)
  Uni.ofSheffield, SheffieldHallam Uni. (University)
VISUALISATION
Data Visualization
• View-based interfaces (e.g. Semantic Crystal and Smeagol)
  support users in query formulation by showing the
  underlying data and connections.

• Helpful for users, especially those unfamiliar with the
  search domain.

• Try to bridge the gap between user terms and tool terms
  (habitability problem)

• Facing challenge to visualize large datasets without
  cluttering the view and affecting user experience.
Data Visualization: Proposed approach
• Visualizing large datasets (especially heterogeneous ones)
  is a challenge.

• To overcome this, we need to select and visualize specific
  parts of the data.

• Exploit collaborative knowledge in query logs to derive
  selection of concepts and predicates added to user’s
  subgraph of interest.
Data Visualization: Proposed approach
Steps
1) User enters NL query
2) Return best-attempt results
3) Identify query instances and find their types
4) For each type:
     • Extract most queried predicates associated with it from
       concept-predicates model.
     • Extract most queried concepts associated with it from
       query-concepts model.
5) Add these to the user’s query graph (see next slide)
Example
Query: “What is the capital of Egypt?”
                                              Best-attempt
  Answer: Cairo                                  results
                                                               Result-
➔ latitude: 30.058056      ➔ depiction:                        Related
                                                            information
➔ longitude: 31.228889
➔ population: 6758581
➔ area: 453000000
➔ time zone: Eastern European Time
➔ subdivision: Governorates of Egypt
➔ page: http://www.cairo.gov.eg/default.aspx
➔ nickname: The City of a Thousand Minarets, Capital of the
  Arab World
Example
Query: “What is the capital of Egypt?”                Query-Related
                                                        information
Answer: Cairo

➔ Cairo Uni., Ain Shams Uni., German Uni., British Uni. (University)
➔ Ittihad El Shorta, El Shams Club, AlNasr Egypt (SportsTeam)
➔ Orascom Telecom, HSBC Bank, EgyptAir, Olympic Grp (Organisation)
➔ Nile River (River)
➔ Al Azhar Park (Park)
➔ Hani Shaker, Sherine, Umm Kulthum, Am Diab (MusicalArtist)
➔ Nile TV, AL Nile, Al-Baghdadia TV (BroadCaster)
➔ Egyptian Museum, Museum of Islamic Art (Museum)
Data Visualization: Proposed approach
Step 5: Add concepts and
predicates to user’s query
graph


 Most queried                              Most queried
predicates with                            concepts with
  “Country”                                  “Country”




       Query
      instance
Questions




Thank You


Questions?

Mais conteúdo relacionado

Mais procurados

Ph d sem_1@iitm
Ph d sem_1@iitmPh d sem_1@iitm
Ph d sem_1@iitm
Vinu Ev
 
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
pamselle
 
Data analysis – using computers
Data analysis – using computersData analysis – using computers
Data analysis – using computers
Noonapau
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 

Mais procurados (17)

Answer Extraction for how and why Questions in Question Answering Systems
Answer Extraction for how and why Questions in Question Answering SystemsAnswer Extraction for how and why Questions in Question Answering Systems
Answer Extraction for how and why Questions in Question Answering Systems
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and Challenges
 
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw... Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 
Domain Identification for Linked Open Data
Domain Identification for Linked Open DataDomain Identification for Linked Open Data
Domain Identification for Linked Open Data
 
Domain Identification for Linked Open Data
Domain Identification for Linked Open DataDomain Identification for Linked Open Data
Domain Identification for Linked Open Data
 
Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...
Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...
Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...
 
Workshop 2 using nvivo 12 for qualitative data analysis
Workshop 2 using nvivo 12 for qualitative data analysisWorkshop 2 using nvivo 12 for qualitative data analysis
Workshop 2 using nvivo 12 for qualitative data analysis
 
Action research for_librarians_carl2012
Action research for_librarians_carl2012Action research for_librarians_carl2012
Action research for_librarians_carl2012
 
Computer Software in Qualitative Research: An Introduction to NVivo
Computer Software in Qualitative Research: An Introduction to NVivoComputer Software in Qualitative Research: An Introduction to NVivo
Computer Software in Qualitative Research: An Introduction to NVivo
 
Question answering
Question answeringQuestion answering
Question answering
 
Practical machine learning - Part 1
Practical machine learning - Part 1Practical machine learning - Part 1
Practical machine learning - Part 1
 
Ph d sem_1@iitm
Ph d sem_1@iitmPh d sem_1@iitm
Ph d sem_1@iitm
 
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
 
Using and learning phrases
Using and learning phrasesUsing and learning phrases
Using and learning phrases
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
 
Data analysis – using computers
Data analysis – using computersData analysis – using computers
Data analysis – using computers
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 

Semelhante a Improving Semantic Search Using Query Log Analysis

The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
Carole Goble
 
A Model of the Scholarly Community
A Model of the Scholarly CommunityA Model of the Scholarly Community
A Model of the Scholarly Community
Marko Rodriguez
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
ICZN
 
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Giannis Tsakonas
 
Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014
Maria Eskevich
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
Artificial Intelligence Institute at UofSC
 

Semelhante a Improving Semantic Search Using Query Log Analysis (20)

Type-Aware Entity Retrieval
Type-Aware Entity RetrievalType-Aware Entity Retrieval
Type-Aware Entity Retrieval
 
Loupe model - Use Cases and Requirements
Loupe model - Use Cases and Requirements Loupe model - Use Cases and Requirements
Loupe model - Use Cases and Requirements
 
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
A Model of the Scholarly Community
A Model of the Scholarly CommunityA Model of the Scholarly Community
A Model of the Scholarly Community
 
Recommender Systems and Linked Open Data
Recommender Systems and Linked Open DataRecommender Systems and Linked Open Data
Recommender Systems and Linked Open Data
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
 
Epistemic networks for Epistemic Commitments
Epistemic networks for Epistemic CommitmentsEpistemic networks for Epistemic Commitments
Epistemic networks for Epistemic Commitments
 
Discovery Hub: on-the-fly linked data exploratory search
Discovery Hub: on-the-fly linked data exploratory searchDiscovery Hub: on-the-fly linked data exploratory search
Discovery Hub: on-the-fly linked data exploratory search
 
Search Analytics for Content Strategists
Search Analytics for Content StrategistsSearch Analytics for Content Strategists
Search Analytics for Content Strategists
 
Twente ir-course 20-10-2010
Twente ir-course 20-10-2010Twente ir-course 20-10-2010
Twente ir-course 20-10-2010
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
CiteSeerX: Mining Scholarly Big Data
CiteSeerX: Mining Scholarly Big DataCiteSeerX: Mining Scholarly Big Data
CiteSeerX: Mining Scholarly Big Data
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)
 
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
 
Survey Research in Software Engineering
Survey Research in Software EngineeringSurvey Research in Software Engineering
Survey Research in Software Engineering
 
Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Improving Semantic Search Using Query Log Analysis

  • 1. Improving Semantic Search Using Query Log Analysis Khadija Elbedweihy, Stuart N. Wrigley and Fabio Ciravegna OAK Research Group, Department of Computer Science, University of Sheffield, UK
  • 2. Outline • Introduction • Semantic Query Logs Analysis - Query-Concepts Model - Concepts-Predicates Model - Instance-Types Model • Results Augmentation • Data Visualisation
  • 4. Motivation • Little work on results returned (answers) and presentation style. – Users want direct answers augmented with more information for richer experience1 – Users want more user-friendly and attractive results presentation format1 • Semantic query logs: logs of queries issued to repositories containing RDF data. 1. See our paper from this morning’s IWEST 2012 workshop
  • 5. Related Work Semantic query logs analysis: • Moller et al. identified patterns of Linked Data usage with respect to different types of agents. • Arias et al. analysed the structure of the SPARQL queries to identify most frequent language elements. • Luczak-Rösch et al. analysed query logs to detect errors and weaknesses in LD ontologies and support their maintenance.
  • 6. Related Work (cont’d) How our work is different: Analyze semantic query logs to produce models capturing different patterns of information needs on Linked Data:  Concepts used together in a query: query-concepts model  Predicate used with a concept: concept-predicates model  Concepts used as types of a LD entity: instance-types model The models make use of the “collaborative knowledge” inherent in the logs to enhance the search process.
  • 8. Extraction • Query logs entries follow the Combined Log Format (CLF): Extract SPARQL query SELECT DISTINCT ?genre, ?instrument WHERE { <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>. <…dbpedia.org…/Ringo_Starr> dbpedia:genre ?genre. <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument. }
  • 9. Analysis SELECT DISTINCT ?genre, ?instrument WHERE { <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>. <…dbpedia.org…/Ringo_Starr> dbpedia:genre ?genre. <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument. } • For each bound resource (subject or object) -> query endpoint for the type of the resource http://dbpedia.org/resource/Ringo_Starr type http://dbpedia.org/ontology/MusicalArtist
  • 10. Query-Concepts Model SELECT DISTINCT ?genre, ?instrument WHERE { <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>. <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument. } 1) Retrieve types of resources in the query: Ringo_Starr type dbpedia-owl:MusicalArtist, umbel:MusicalPerformer The_Beatles type dbpedia-owl:Band, schema:MusicGroup 2) Increment the co-occurrence of each concept in the first list with each concept in the second: MusicalArtist Band MusicalPerformer MusicGroup MusicalArtist MusicGroup MusicalPerformer Band
  • 11. Concept-Predicates Model SELECT DISTINCT ?genre, ?instrument WHERE { <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>. <…dbpedia.org…/Ringo_Starr> dbpedia:genre ?genre. <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument. } 1) Retrieve types of resources used as subjects in the query: Ringo_Starr type dbpedia-owl:MusicalArtist, umbel:MusicalPerformer 2) Identify bound predicates (dbpedia:genre, dbpedia:instrument) 3) Increment the co-occurrence of each type with the predicate used in the same triple pattern: MusicalPerformer genre MusicalPerformer instrument MusicalArtist genre MusicalArtist instrument
  • 12. Instance-Types Model SELECT DISTINCT ?genre, ?instrument WHERE { <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>. <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument. } 1) Retrieve types of resources in the query: Ringo_Starr type dbpedia-owl:MusicalArtist, umbel:MusicalPerformer The_Beatles type dbpedia-owl:Band, schema:MusicGroup 2) Increment the co-occurrence of concepts found as types for the same instance: MusicalArtist MusicalPerformer Band MusicGroup
  • 14. Dataset • Two sets of DBpedia query logs made available at the USEWOD2011 and USEWOD2012 workshops. • The logs contained around 5 million queries issued to DBpedia over a time period spanning almost 2 years USEWOD2012 USEWOD2011 Number of analyzed queries 8866028 4951803 Number of unique triple patterns 4095011 2641098 Number of unique bound triple patterns 3619216 2571662
  • 15. Results Enhancement • Google, Yahoo!, Bing, etc. enhance search results using structured data • FalconS and VisiNav return extra information together with each entity in the answers (e.g. type, label) • Evaluation of Semantic Search showed that augmenting answers with extra information provides a richer user experience2. 2. See our paper from this morning’s IWEST 2012 workshop
  • 16. FalconS Results Query: `population of New York city’ • Information chosen depend on manually (randomly) predefined set.
  • 17. Motivation for proposed approach • Utilizing query logs as a source of collaborative knowledge able to capture implicit associations between Linked Data entities and properties. • Use this to select which information to show the user. • Two recent studies3 analyzed semantic query logs and observed that a class of entities is usually queried with similar relations and concepts. 3. Luczak-Rösch et al. ; Elbedweihy et al.
  • 18. Two Related Types of Result Augmentation 1. Additional result-related information. – More details about each result item – Provides better understanding of the answer. 2. Additional query-related information. – More results related to the query entities – Assists users in discovering useful findings (serendipity)
  • 19. Return additional result-related information Steps 1) For each result item, find types of instance. 1) Most frequently queried predicates associated with them are extracted from the concept-predicates model. 2) Generate queries with each pair (instance, predicate). e.g. (<…dbpedia.org…/Ringo_Starr> , genre) 3) Show aggregated results to the user.
  • 20. Return additional result-related information • MusicalArtist-> genre, associatedBand, occupation, instrument, birthDate, birthPlace, hometown, prop:yearsActive, foaf:surname, prop:associatedActs, … Query: “Who played drums for the Beatles?” Result: Ringo Starr Pop music, Rock music (genre) Keyboard, Drum,Acousticguitar(instrument) The Beatles, Plastic Ono Band, Rory Storm,(assoc.Band)
  • 21. Return additional query-related information Steps 1) Extract all concepts from query. 2) For any instances, find their types. 3) For each query concept, find most frequently occurring concepts from the query-concepts model. 4) For each related concept, query for instances that have relation with the originating instance. 5) Show aggregated results to the user.
  • 22. Return additional query-related information • City-> Book, Person, Country, Organisation, SportsTeam, MusicGroup, Film, RadioStation, River, University, SoccerPlayer, Hospital, ... Query: “Where is the University of Sheffield located?” Result: Sheffield,UK NickClegg,CliveBetts, DavidBlunkett(Person) SheffieldUnitedF SheffieldWednesday (SportsT .C., eam) Hallam FM,RealRadio, BBCRadioSheffield (RadioStn.) JessopHosp.,NorthernGeneral, RoyalHallamshire(Hospital) Uni.ofSheffield, SheffieldHallam Uni. (University)
  • 24. Data Visualization • View-based interfaces (e.g. Semantic Crystal and Smeagol) support users in query formulation by showing the underlying data and connections. • Helpful for users, especially those unfamiliar with the search domain. • Try to bridge the gap between user terms and tool terms (habitability problem) • Facing challenge to visualize large datasets without cluttering the view and affecting user experience.
  • 25. Data Visualization: Proposed approach • Visualizing large datasets (especially heterogeneous ones) is a challenge. • To overcome this, we need to select and visualize specific parts of the data. • Exploit collaborative knowledge in query logs to derive selection of concepts and predicates added to user’s subgraph of interest.
  • 26. Data Visualization: Proposed approach Steps 1) User enters NL query 2) Return best-attempt results 3) Identify query instances and find their types 4) For each type: • Extract most queried predicates associated with it from concept-predicates model. • Extract most queried concepts associated with it from query-concepts model. 5) Add these to the user’s query graph (see next slide)
  • 27. Example Query: “What is the capital of Egypt?” Best-attempt Answer: Cairo results Result- ➔ latitude: 30.058056 ➔ depiction: Related information ➔ longitude: 31.228889 ➔ population: 6758581 ➔ area: 453000000 ➔ time zone: Eastern European Time ➔ subdivision: Governorates of Egypt ➔ page: http://www.cairo.gov.eg/default.aspx ➔ nickname: The City of a Thousand Minarets, Capital of the Arab World
  • 28. Example Query: “What is the capital of Egypt?” Query-Related information Answer: Cairo ➔ Cairo Uni., Ain Shams Uni., German Uni., British Uni. (University) ➔ Ittihad El Shorta, El Shams Club, AlNasr Egypt (SportsTeam) ➔ Orascom Telecom, HSBC Bank, EgyptAir, Olympic Grp (Organisation) ➔ Nile River (River) ➔ Al Azhar Park (Park) ➔ Hani Shaker, Sherine, Umm Kulthum, Am Diab (MusicalArtist) ➔ Nile TV, AL Nile, Al-Baghdadia TV (BroadCaster) ➔ Egyptian Museum, Museum of Islamic Art (Museum)
  • 29. Data Visualization: Proposed approach Step 5: Add concepts and predicates to user’s query graph Most queried Most queried predicates with concepts with “Country” “Country” Query instance