SlideShare uma empresa Scribd logo
1 de 23
Characterizing Machine Agent
Behavior through SPARQL Query
            Mining
       Aravindan Raghuveer
      Yahoo! Inc, Bangalore.
     aravindr@yahoo-inc.com
Introduction: LOD Users

                 The LOD cloud has two types of users
                      - Humans (browsers).
                      - Programs / machine agents.




                                                         2
Yahoo! Confidential
Introduction: LOD Access Methods

                 The data on the LOD cloud can be accessed
                  in multiple ways.

                 For this work, we categorize them into two
                  buckets:
                      - SPARQL : A powerful declarative graph query
                        language

                      - Non-SPARQL: Direct linked data requests.
                                                                      3
Yahoo! Confidential
Motivation: User Behavior Understanding

                Deep Understanding of client behavior can help
                 build “better” serving systems
                Better:
                   - Secure
                   - Scalable
                   - Available
                Prior Work:
                    -   Moller et al , WebSci 2010
                    -   Picalausa et al. Swim 2011
                    -   Kirchberg et. al Usewod 2011
                                                                  4
Yahoo! Confidential -   Mario et. Al, Usewod 2011
Summarizing. . .


                                   Human Users   Machine Agents


                      Non-SPARQL


                      SPARQL                     This paper’s
                                                  focus




                                                                  5
Yahoo! Confidential
What this paper is about?

                 Mining of the USEWOD query log dataset to
                  identify:

                      - Two Trends in Machine Agent Querying

                      - Two Patterns in Machine Agent Querying




                                                                 6
Yahoo! Confidential
The USEWOD dataset

                 Query logs of servers hosting a part of LOD cloud data.

                                Type            # records   % SPARQL
                                                (million)
                      bio2rdf   Life sciences   ~ 0.2       100%


                      lgd       Geo             ~ 1.9       100%


                      SWDF      Conference      ~ 16.7      43.38%


                      dbpedia   Structured      ~ 36.2      46.9%
                                wikipedia
                                                                       7
Yahoo! Confidential
Part-1: Two Trends in Machine Agent
                                    Querying

                                 The Theme

                      “What are the overarching trends for
                              SPARQL queries?”



                                                             8
Yahoo! Confidential
Trend-1: SPARQL is here to stay!


                                                                 0.1 – 1million




                         SWDF                         Dbpedia


                      Take-away: SPARQL query volume is pretty
                                     significant
                                                                           9
Yahoo! Confidential
Trend-2: SPARQL is heavily used by machine
                              agents.
           Took 17 million user agents from SPARQL queries from dbpedia
                                         and..




                                                                          10
Yahoo! Confidential
Part-2: Two Patterns in Machine Agent
           Querying


                                     The Theme

                      “Looking at SPARQL query logs, can we reason
                      about the program that generated the queries?”




                                                                       11
Yahoo! Confidential
Salient aspects of proposed Query Mining
                                Techniques


                 Move from per query analysis to query session
                  analysis

                 Move from query analysis to query result analysis




                                                                      12
Yahoo! Confidential
Pattern -1 : Loops in Programs

                                    Take-away

           •      Through a per-user, temporal mining of logs, we
                  discover patterns that are caused by loops in
                  program.

           •      Significant support in all 4 datasets

                                                                    13
Yahoo! Confidential
Per-user Temporal mining
TIME
                                                                                 Loop




           Original Logs
                                    User level                Session Analysis

                                                                                  14
Yahoo! Confidential
                           User-1   User-2       User-3   User-4
Intra Pattern Loop

                 successive queries from the same user, use the
                  same “template”
                 Example: Two successive queries:
                      SELECT * WHERE {http://bio2rdf.org/dr:D00332
                      http://bio2rdf.org/ns/bio2rdf#xRef
                      http://bio2rdf.org/cas:54-47-7}

                      SELECT * WHERE{http://bio2rdf.org/dr:D00333
                      http://bio2rdf.org/ns/bio2rdf#xRef
                      http://bio2rdf.org/cas:54-47-7}



           Only the subject (D00332,D00333) varies
                                                                     15
Yahoo! Confidential
Detecting Intra Pattern Loop

                 We convert a query to its canonical form by
                  replacing variables, URI and literals by
                  “keywords”.
                      SELECT * WHERE {http://bio2rdf.org/dr:D00332
               Canonical Form of the previous queries: SELECT *
                      http://bio2rdf.org/ns/bio2rdf#xRef
                      http://bio2rdf.org/cas:54-47-7}
                WHERE { _URI_ _URI_ _URI_ }

               Queries generated by the same template will have
                the same canonical form.
                                                                     16
Yahoo! Confidential
Salient Aspects of Intra Pattern loops

                 Iterate over a dictionary of values (categorical)

                 Iterate over a numerical range (example LIMIT,
                  OFFSET parameters in SPARQL queries)

                 Multiple levels of nested loops with the same
                  intra loop pattern.

                 4 Parameters to quantify above (in paper)
                                                                      17
Yahoo! Confidential
Inter Pattern Loops

                 Found loops that iterate over a set of patterns
                      P1,P2,P3 ,P1,P2,P3 ,P1,P2,P3

            Typically used when the output of the first query
             goes as a parameter to the second query.
           (examples in paper)




                                                                    18
Yahoo! Confidential
Results
                                86%                                   32%




                                          Take-away:
                      bio2rdf         Significant support
                                        40% for loops!       lgd
                                                                       16%




                      swdf                                  dbpedia         19
Yahoo! Confidential
Pattern-2: Querying for dbpedia Linkage

           Take-away:
           •      By executing each query
           •      analyze the results, we find that a portion of
                  queries “look” for dbpedia links
           •      Results:
                      -   20 months of SWDF queries had average of 8% look
                          for dbpedia urls
                      -   2 days worth of lgd queries had 26.5% queries look
                          for dbpedia urls
                                                                               20
Yahoo! Confidential
Summary & Conclusions
                 Proposed 2 new ways of SPARQL query mining:
                      - Session view
                      - Analyze results in addition to query
                 Showed that machine agents look for dbpedia using the
                  owl:sameas annotation.

               Influence on system design:
                  - Can we pre-fetch elements in loop beforehand?
                  - Priortitize dbpedia attributes for caching
               Influence on log collection & analysis:
                  - Stratified random sampling to remove effect of loops.
                                                                            21
Yahoo! Confidential
For the great data !!
                      For the great feedback & comments
                      For listening!
                                                           22
Yahoo! Confidential
The famous LOD Cloud . . .




                      7 billion triples and counting!!
                                                         23
Yahoo! Confidential

Mais conteúdo relacionado

Último

Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 

Último (20)

Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 

Destaque

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Destaque (20)

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 

Characterizing Machine Agent Behavior through SPARQL Query Mining

  • 1. Characterizing Machine Agent Behavior through SPARQL Query Mining Aravindan Raghuveer Yahoo! Inc, Bangalore. aravindr@yahoo-inc.com
  • 2. Introduction: LOD Users  The LOD cloud has two types of users - Humans (browsers). - Programs / machine agents. 2 Yahoo! Confidential
  • 3. Introduction: LOD Access Methods  The data on the LOD cloud can be accessed in multiple ways.  For this work, we categorize them into two buckets: - SPARQL : A powerful declarative graph query language - Non-SPARQL: Direct linked data requests. 3 Yahoo! Confidential
  • 4. Motivation: User Behavior Understanding  Deep Understanding of client behavior can help build “better” serving systems  Better: - Secure - Scalable - Available  Prior Work: - Moller et al , WebSci 2010 - Picalausa et al. Swim 2011 - Kirchberg et. al Usewod 2011 4 Yahoo! Confidential - Mario et. Al, Usewod 2011
  • 5. Summarizing. . . Human Users Machine Agents Non-SPARQL SPARQL This paper’s focus 5 Yahoo! Confidential
  • 6. What this paper is about?  Mining of the USEWOD query log dataset to identify: - Two Trends in Machine Agent Querying - Two Patterns in Machine Agent Querying 6 Yahoo! Confidential
  • 7. The USEWOD dataset  Query logs of servers hosting a part of LOD cloud data. Type # records % SPARQL (million) bio2rdf Life sciences ~ 0.2 100% lgd Geo ~ 1.9 100% SWDF Conference ~ 16.7 43.38% dbpedia Structured ~ 36.2 46.9% wikipedia 7 Yahoo! Confidential
  • 8. Part-1: Two Trends in Machine Agent Querying The Theme “What are the overarching trends for SPARQL queries?” 8 Yahoo! Confidential
  • 9. Trend-1: SPARQL is here to stay! 0.1 – 1million SWDF Dbpedia Take-away: SPARQL query volume is pretty significant 9 Yahoo! Confidential
  • 10. Trend-2: SPARQL is heavily used by machine agents. Took 17 million user agents from SPARQL queries from dbpedia and.. 10 Yahoo! Confidential
  • 11. Part-2: Two Patterns in Machine Agent Querying The Theme “Looking at SPARQL query logs, can we reason about the program that generated the queries?” 11 Yahoo! Confidential
  • 12. Salient aspects of proposed Query Mining Techniques  Move from per query analysis to query session analysis  Move from query analysis to query result analysis 12 Yahoo! Confidential
  • 13. Pattern -1 : Loops in Programs Take-away • Through a per-user, temporal mining of logs, we discover patterns that are caused by loops in program. • Significant support in all 4 datasets 13 Yahoo! Confidential
  • 14. Per-user Temporal mining TIME Loop Original Logs User level Session Analysis 14 Yahoo! Confidential User-1 User-2 User-3 User-4
  • 15. Intra Pattern Loop  successive queries from the same user, use the same “template”  Example: Two successive queries: SELECT * WHERE {http://bio2rdf.org/dr:D00332 http://bio2rdf.org/ns/bio2rdf#xRef http://bio2rdf.org/cas:54-47-7} SELECT * WHERE{http://bio2rdf.org/dr:D00333 http://bio2rdf.org/ns/bio2rdf#xRef http://bio2rdf.org/cas:54-47-7}  Only the subject (D00332,D00333) varies 15 Yahoo! Confidential
  • 16. Detecting Intra Pattern Loop  We convert a query to its canonical form by replacing variables, URI and literals by “keywords”. SELECT * WHERE {http://bio2rdf.org/dr:D00332  Canonical Form of the previous queries: SELECT * http://bio2rdf.org/ns/bio2rdf#xRef http://bio2rdf.org/cas:54-47-7} WHERE { _URI_ _URI_ _URI_ }  Queries generated by the same template will have the same canonical form. 16 Yahoo! Confidential
  • 17. Salient Aspects of Intra Pattern loops  Iterate over a dictionary of values (categorical)  Iterate over a numerical range (example LIMIT, OFFSET parameters in SPARQL queries)  Multiple levels of nested loops with the same intra loop pattern.  4 Parameters to quantify above (in paper) 17 Yahoo! Confidential
  • 18. Inter Pattern Loops  Found loops that iterate over a set of patterns P1,P2,P3 ,P1,P2,P3 ,P1,P2,P3  Typically used when the output of the first query goes as a parameter to the second query. (examples in paper) 18 Yahoo! Confidential
  • 19. Results 86% 32% Take-away: bio2rdf Significant support 40% for loops! lgd 16% swdf dbpedia 19 Yahoo! Confidential
  • 20. Pattern-2: Querying for dbpedia Linkage Take-away: • By executing each query • analyze the results, we find that a portion of queries “look” for dbpedia links • Results: - 20 months of SWDF queries had average of 8% look for dbpedia urls - 2 days worth of lgd queries had 26.5% queries look for dbpedia urls 20 Yahoo! Confidential
  • 21. Summary & Conclusions  Proposed 2 new ways of SPARQL query mining: - Session view - Analyze results in addition to query  Showed that machine agents look for dbpedia using the owl:sameas annotation.  Influence on system design: - Can we pre-fetch elements in loop beforehand? - Priortitize dbpedia attributes for caching  Influence on log collection & analysis: - Stratified random sampling to remove effect of loops. 21 Yahoo! Confidential
  • 22. For the great data !! For the great feedback & comments For listening! 22 Yahoo! Confidential
  • 23. The famous LOD Cloud . . . 7 billion triples and counting!! 23 Yahoo! Confidential