SlideShare uma empresa Scribd logo
1 de 22
Moving beyond sameAs with PLATO:
Partonomy detection for Linked Data
     Prateek Jain, Pascal Hitzler, AmitSheth
                Kno.e.sis Center
      Wright State University, Dayton, OH

          Peter Z. Yeh, KunalVerma
          Accenture Technology Labs
                 San Jose, CA

        May2012 –GE Conference 2012–Prateek Jain
        23rd ACM HT Global Research– Prateek Jain
Outline


• Introduction - Linked Open Data

• Challenges

• PLATO – Partonomic Relationship detection

• Conclusion & Future Work




                     May2012 –GE Conference 2012–Prateek Jain
                     23rd ACM HT Global Research– Prateek Jain   2
Tim Berners-Lee 2006

• from http://www.w3.org/DesignIssues/LinkedData.html



1.   Use URIs as names for things
2.   Use HTTP URIs so that people can look up those names.
3.   When someone looks up a URI, provide useful information, using the
     standards (RDF*, SPARQL)
4.   Include links to other URIs. so that they can discover more things.




                       May2012 –GE Conference 2012–Prateek Jain
                       23rd ACM HT Global Research– Prateek Jain           3
Linked Open Data 2011




            May2012 –GE Conference 2012–Prateek Jain
            23rd ACM HT Global Research– Prateek Jain   4
Linked Open Data

Number of Datasets                         Number of triples (Sept 2011)

2011-09-19     295                         31,634,213,770
2010-09-22     203
2009-07-14     95                          with 503,998,829 out-links
2008-09-18     45
2007-10-08     25
2007-05-01     12




                            From http://www4.wiwiss.fu-berlin.de/lodcloud/state/


                     May2012 –GE Conference 2012–Prateek Jain
                     23rd ACM HT Global Research– Prateek Jain                     5
May2012 –GE Conference 2012–Prateek Jain
23rd ACM HT Global Research– Prateek Jain   6
May2012 –GE Conference 2012–Prateek Jain
23rd ACM HT Global Research– Prateek Jain   7
Mainstream Semantic Web?




      May 2012 –IBM TJ Watson Center– Prateek Jain
Is it really mainstream Semantic Web?

• What is the relationship between the models whose instances are being
  linked?

• How to do querying on LOD without knowing individual datasets?

• How to perform schema level reasoning over LOD cloud?

• A very fundamental, important and conceptual relationship namely “PART
  OF” has little or no existence in LOD




                     May2012 –GE Conference 2012–Prateek Jain
                     23rd ACM HT Global Research– Prateek Jain             9
PLATO Approach




 May2012 –GE Conference 2012–Prateek Jain
 23rd ACM HT Global Research– Prateek Jain
Our Approach

Use knowledge contributed by users




                                                     • Detection of relationships
                                                       within and across
                                                       datasets

 LOD
 Cloud




                    May2012 –GE Conference 2012–Prateek Jain
                    23rd ACM HT Global Research– Prateek Jain                       11
PLATO Approach


• PLATO generates all possible partonomically linked pairs between the
  entities in the dataset.
   – Utilize “strongly” associated entities

• Identify the type of each entity in the pair using WordNet.
   – Use Class Names
   – Gives the lexicographer files for the synsets corresponding to these
     entities

• Use this information to determine the applicable OWL partonomy
  properties.
   – Using Winston’s taxonomy



                      May2012 –GE Conference 2012–Prateek Jain
                      23rd ACM HT Global Research– Prateek Jain             12
Winston’s Taxonomy




            May2012 –GE Conference 2012–Prateek Jain
            23rd ACM HT Global Research– Prateek Jain   13
PLATO Approach – Step 2

• PLATO generates linguistic patterns for each applicable property based on
  linguistic cues suggested by Winston.
    – Cell Wall is made of Cellulose
    – Cellulose is made of Cell Wall
    – Cell Wall is partly Cellulose

• Tests the lexical patterns for each entity pair in a corpus-driven manner.
   – Using Web as a corpus

• PLATO counts the total number of web pages that contain the pattern
    – Parse the page and identify the occurance of pattern.




                        May2012 –GE Conference 2012–Prateek Jain
                        23rd ACM HT Global Research– Prateek Jain              14
PLATO Approach – Step 3

• Asserts the partonomy property with strongest supporting evidence
    – Cell Wall is made of Cellulose, 48
    – Cellulose is made of Cell Wall, 10



• PLATO also enriches the schema by generalizing from the instance level
  assertions.




                         May2012 –GE Conference 2012–Prateek Jain
                         23rd ACM HT Global Research– Prateek Jain         15
PLATO Evaluation




            May2012 –GE Conference 2012–Prateek Jain
            23rd ACM HT Global Research– Prateek Jain   16
Outreach

• Prateek Jain, Pascal Hitzler, KunalVerma, Peter Z. Yeh and Amit P.
  Sheth, “Moving beyond sameAs with PLATO: Partonomy detection for
  Linked Data”. In Proceedings of the 23rd ACM Hypertext and Social Media
  conference (HT 2012), Milwaukee, WI, USA, June 25th-28th, 2012 (To
  Appear)

• Tool available for download at

   http://wiki.knoesis.org/index.php/PLATO




                     May2012 –GE Conference 2012–Prateek Jain
                     23rd ACM HT Global Research– Prateek Jain              17
End Product




              May2012 –GE Conference 2012–Prateek Jain
              23rd ACM HT Global Research– Prateek Jain   18
Conclusions and Future Work




       May2012 –GE Conference 2012–Prateek Jain
       23rd ACM HT Global Research– Prateek Jain
Conclusions

• PLATO is an approach for partonomicrelationship detection

• Approach works for both instances and schema level relationships

• Evaluation performed between and within prominent and big LOD
  datasets

• Results validate the use of knowledge on the Web to solve tough
  problems




                     May2012 –GE Conference 2012–Prateek Jain
                     23rd ACM HT Global Research– Prateek Jain       20
Future Work

• Use incomplete knowledge for part of relationship identification
   – Machine learning based techniques

• Release the schema mappings in public domain

• Develop better querying system for LOD using PLATO and BLOOMS
  • Work in progress with ALOQUS (Submitted to ODBASE 2012)

• Identify and incorporate user preferences




                      May2012 –GE Conference 2012–Prateek Jain
                      23rd ACM HT Global Research– Prateek Jain      21
Questions?



                               Prateek Jain
                           Kno.e.sis Center
       Wright State University, Dayton, OH
http://wiki.knoesis.org/index.php/Prateek


              May2012 –GE Conference 2012–Prateek Jain
              23rd ACM HT Global Research– Prateek Jain

Mais conteúdo relacionado

Semelhante a Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble
 
xldb2012_wed_0950_TimFrazier
xldb2012_wed_0950_TimFrazierxldb2012_wed_0950_TimFrazier
xldb2012_wed_0950_TimFrazier
Tim Frazier
 

Semelhante a Moving beyond sameAs with PLATO: Partonomy detection for Linked Data (20)

Camp 4-data workshop presentation
Camp 4-data workshop presentationCamp 4-data workshop presentation
Camp 4-data workshop presentation
 
Tragedy of the (Data) Commons
Tragedy of the (Data) CommonsTragedy of the (Data) Commons
Tragedy of the (Data) Commons
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13
 
Dive deep into your Data Pools
Dive deep into your Data PoolsDive deep into your Data Pools
Dive deep into your Data Pools
 
FAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfFAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdf
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)
 
The Future of LOD
The Future of LODThe Future of LOD
The Future of LOD
 
Donders neuroimage toolkit - open science and good practices
Donders neuroimage toolkit -  open science and good practicesDonders neuroimage toolkit -  open science and good practices
Donders neuroimage toolkit - open science and good practices
 
Linked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for EntrepreneursLinked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for Entrepreneurs
 
LinkedUp - Linked Data & Education
LinkedUp - Linked Data & EducationLinkedUp - Linked Data & Education
LinkedUp - Linked Data & Education
 
Semantic Web vision and its relevance to Open Digital Data for MGI
Semantic Web vision and its relevance to Open Digital Data for MGISemantic Web vision and its relevance to Open Digital Data for MGI
Semantic Web vision and its relevance to Open Digital Data for MGI
 
The Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataThe Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of Metadata
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Rdaeu russia_fg_1_july2014_final
Rdaeu  russia_fg_1_july2014_finalRdaeu  russia_fg_1_july2014_final
Rdaeu russia_fg_1_july2014_final
 
SemWeb 4 Gov – opportunities and challenges
SemWeb 4 Gov – opportunities and challengesSemWeb 4 Gov – opportunities and challenges
SemWeb 4 Gov – opportunities and challenges
 
SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...SEEKing our way to better presentation of data and models from scientific inv...
SEEKing our way to better presentation of data and models from scientific inv...
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Open Data & Open Research Data Repositories
Open Data & Open Research Data RepositoriesOpen Data & Open Research Data Repositories
Open Data & Open Research Data Repositories
 
xldb2012_wed_0950_TimFrazier
xldb2012_wed_0950_TimFrazierxldb2012_wed_0950_TimFrazier
xldb2012_wed_0950_TimFrazier
 

Último

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Último (20)

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 

Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

  • 1. Moving beyond sameAs with PLATO: Partonomy detection for Linked Data Prateek Jain, Pascal Hitzler, AmitSheth Kno.e.sis Center Wright State University, Dayton, OH Peter Z. Yeh, KunalVerma Accenture Technology Labs San Jose, CA May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain
  • 2. Outline • Introduction - Linked Open Data • Challenges • PLATO – Partonomic Relationship detection • Conclusion & Future Work May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 2
  • 3. Tim Berners-Lee 2006 • from http://www.w3.org/DesignIssues/LinkedData.html 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) 4. Include links to other URIs. so that they can discover more things. May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 3
  • 4. Linked Open Data 2011 May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 4
  • 5. Linked Open Data Number of Datasets Number of triples (Sept 2011) 2011-09-19 295 31,634,213,770 2010-09-22 203 2009-07-14 95 with 503,998,829 out-links 2008-09-18 45 2007-10-08 25 2007-05-01 12 From http://www4.wiwiss.fu-berlin.de/lodcloud/state/ May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 5
  • 6. May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 6
  • 7. May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 7
  • 8. Mainstream Semantic Web? May 2012 –IBM TJ Watson Center– Prateek Jain
  • 9. Is it really mainstream Semantic Web? • What is the relationship between the models whose instances are being linked? • How to do querying on LOD without knowing individual datasets? • How to perform schema level reasoning over LOD cloud? • A very fundamental, important and conceptual relationship namely “PART OF” has little or no existence in LOD May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 9
  • 10. PLATO Approach May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain
  • 11. Our Approach Use knowledge contributed by users • Detection of relationships within and across datasets LOD Cloud May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 11
  • 12. PLATO Approach • PLATO generates all possible partonomically linked pairs between the entities in the dataset. – Utilize “strongly” associated entities • Identify the type of each entity in the pair using WordNet. – Use Class Names – Gives the lexicographer files for the synsets corresponding to these entities • Use this information to determine the applicable OWL partonomy properties. – Using Winston’s taxonomy May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 12
  • 13. Winston’s Taxonomy May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 13
  • 14. PLATO Approach – Step 2 • PLATO generates linguistic patterns for each applicable property based on linguistic cues suggested by Winston. – Cell Wall is made of Cellulose – Cellulose is made of Cell Wall – Cell Wall is partly Cellulose • Tests the lexical patterns for each entity pair in a corpus-driven manner. – Using Web as a corpus • PLATO counts the total number of web pages that contain the pattern – Parse the page and identify the occurance of pattern. May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 14
  • 15. PLATO Approach – Step 3 • Asserts the partonomy property with strongest supporting evidence – Cell Wall is made of Cellulose, 48 – Cellulose is made of Cell Wall, 10 • PLATO also enriches the schema by generalizing from the instance level assertions. May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 15
  • 16. PLATO Evaluation May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 16
  • 17. Outreach • Prateek Jain, Pascal Hitzler, KunalVerma, Peter Z. Yeh and Amit P. Sheth, “Moving beyond sameAs with PLATO: Partonomy detection for Linked Data”. In Proceedings of the 23rd ACM Hypertext and Social Media conference (HT 2012), Milwaukee, WI, USA, June 25th-28th, 2012 (To Appear) • Tool available for download at http://wiki.knoesis.org/index.php/PLATO May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 17
  • 18. End Product May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 18
  • 19. Conclusions and Future Work May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain
  • 20. Conclusions • PLATO is an approach for partonomicrelationship detection • Approach works for both instances and schema level relationships • Evaluation performed between and within prominent and big LOD datasets • Results validate the use of knowledge on the Web to solve tough problems May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 20
  • 21. Future Work • Use incomplete knowledge for part of relationship identification – Machine learning based techniques • Release the schema mappings in public domain • Develop better querying system for LOD using PLATO and BLOOMS • Work in progress with ALOQUS (Submitted to ODBASE 2012) • Identify and incorporate user preferences May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain 21
  • 22. Questions? Prateek Jain Kno.e.sis Center Wright State University, Dayton, OH http://wiki.knoesis.org/index.php/Prateek May2012 –GE Conference 2012–Prateek Jain 23rd ACM HT Global Research– Prateek Jain