The Linked Open Data (LOD) Cloud has gained significant traction over the past few years. With over 275 interlinked datasets across diverse domains such as life science, geography, politics, and more, the LOD Cloud has the potential to support a variety of applications ranging from open domain question answering to drug discovery.
Despite its significant size (approx. 30 billion triples), the data is relatively sparely interlinked (approx. 400 million links). A semantically richer LOD Cloud is needed to fully realize its potential. Data in the LOD Cloud are currently interlinked mainly via the owl:sameAs property, which is inadequate for many applications. Additional properties capturing relations based on causality or partonomy are needed to enable the answering of complex questions and to support applications.
In this work, we present a solution to enrich the LOD Cloud by automatically detecting partonomic relationships, which are well-established, fundamental properties grounded in linguistics and philosophy. We empirically evaluate our solution across several domains, and show that our approach performs well on detecting partonomic properties between LOD Cloud data.
Web & Social Media Analytics Previous Year Question Paper.pdf
Moving beyond sameAs with PLATO: Partonomy detection for Linked Data
1. Moving beyond sameAs with PLATO:
Partonomy detection for Linked Data
Prateek Jain, Pascal Hitzler, AmitSheth
Kno.e.sis Center
Wright State University, Dayton, OH
Peter Z. Yeh, KunalVerma
Accenture Technology Labs
San Jose, CA
May2012 –GE Conference 2012–Prateek Jain
23rd ACM HT Global Research– Prateek Jain
2. Outline
• Introduction - Linked Open Data
• Challenges
• PLATO – Partonomic Relationship detection
• Conclusion & Future Work
May2012 –GE Conference 2012–Prateek Jain
23rd ACM HT Global Research– Prateek Jain 2
3. Tim Berners-Lee 2006
• from http://www.w3.org/DesignIssues/LinkedData.html
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the
standards (RDF*, SPARQL)
4. Include links to other URIs. so that they can discover more things.
May2012 –GE Conference 2012–Prateek Jain
23rd ACM HT Global Research– Prateek Jain 3
4. Linked Open Data 2011
May2012 –GE Conference 2012–Prateek Jain
23rd ACM HT Global Research– Prateek Jain 4
5. Linked Open Data
Number of Datasets Number of triples (Sept 2011)
2011-09-19 295 31,634,213,770
2010-09-22 203
2009-07-14 95 with 503,998,829 out-links
2008-09-18 45
2007-10-08 25
2007-05-01 12
From http://www4.wiwiss.fu-berlin.de/lodcloud/state/
May2012 –GE Conference 2012–Prateek Jain
23rd ACM HT Global Research– Prateek Jain 5
9. Is it really mainstream Semantic Web?
• What is the relationship between the models whose instances are being
linked?
• How to do querying on LOD without knowing individual datasets?
• How to perform schema level reasoning over LOD cloud?
• A very fundamental, important and conceptual relationship namely “PART
OF” has little or no existence in LOD
May2012 –GE Conference 2012–Prateek Jain
23rd ACM HT Global Research– Prateek Jain 9
11. Our Approach
Use knowledge contributed by users
• Detection of relationships
within and across
datasets
LOD
Cloud
May2012 –GE Conference 2012–Prateek Jain
23rd ACM HT Global Research– Prateek Jain 11
12. PLATO Approach
• PLATO generates all possible partonomically linked pairs between the
entities in the dataset.
– Utilize “strongly” associated entities
• Identify the type of each entity in the pair using WordNet.
– Use Class Names
– Gives the lexicographer files for the synsets corresponding to these
entities
• Use this information to determine the applicable OWL partonomy
properties.
– Using Winston’s taxonomy
May2012 –GE Conference 2012–Prateek Jain
23rd ACM HT Global Research– Prateek Jain 12
14. PLATO Approach – Step 2
• PLATO generates linguistic patterns for each applicable property based on
linguistic cues suggested by Winston.
– Cell Wall is made of Cellulose
– Cellulose is made of Cell Wall
– Cell Wall is partly Cellulose
• Tests the lexical patterns for each entity pair in a corpus-driven manner.
– Using Web as a corpus
• PLATO counts the total number of web pages that contain the pattern
– Parse the page and identify the occurance of pattern.
May2012 –GE Conference 2012–Prateek Jain
23rd ACM HT Global Research– Prateek Jain 14
15. PLATO Approach – Step 3
• Asserts the partonomy property with strongest supporting evidence
– Cell Wall is made of Cellulose, 48
– Cellulose is made of Cell Wall, 10
• PLATO also enriches the schema by generalizing from the instance level
assertions.
May2012 –GE Conference 2012–Prateek Jain
23rd ACM HT Global Research– Prateek Jain 15
17. Outreach
• Prateek Jain, Pascal Hitzler, KunalVerma, Peter Z. Yeh and Amit P.
Sheth, “Moving beyond sameAs with PLATO: Partonomy detection for
Linked Data”. In Proceedings of the 23rd ACM Hypertext and Social Media
conference (HT 2012), Milwaukee, WI, USA, June 25th-28th, 2012 (To
Appear)
• Tool available for download at
http://wiki.knoesis.org/index.php/PLATO
May2012 –GE Conference 2012–Prateek Jain
23rd ACM HT Global Research– Prateek Jain 17
18. End Product
May2012 –GE Conference 2012–Prateek Jain
23rd ACM HT Global Research– Prateek Jain 18
19. Conclusions and Future Work
May2012 –GE Conference 2012–Prateek Jain
23rd ACM HT Global Research– Prateek Jain
20. Conclusions
• PLATO is an approach for partonomicrelationship detection
• Approach works for both instances and schema level relationships
• Evaluation performed between and within prominent and big LOD
datasets
• Results validate the use of knowledge on the Web to solve tough
problems
May2012 –GE Conference 2012–Prateek Jain
23rd ACM HT Global Research– Prateek Jain 20
21. Future Work
• Use incomplete knowledge for part of relationship identification
– Machine learning based techniques
• Release the schema mappings in public domain
• Develop better querying system for LOD using PLATO and BLOOMS
• Work in progress with ALOQUS (Submitted to ODBASE 2012)
• Identify and incorporate user preferences
May2012 –GE Conference 2012–Prateek Jain
23rd ACM HT Global Research– Prateek Jain 21
22. Questions?
Prateek Jain
Kno.e.sis Center
Wright State University, Dayton, OH
http://wiki.knoesis.org/index.php/Prateek
May2012 –GE Conference 2012–Prateek Jain
23rd ACM HT Global Research– Prateek Jain