SlideShare uma empresa Scribd logo
1 de 71
Semantics-enhanced Cyberinfrastructure for ICMSE :
Interoperability, Analytics, and Applications
Krishnaprasad Thirunarayan (T. K. Prasad) and Amit Sheth
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
1
Relevant Funded Projects :
A Brush with Pain Points and Promise
• Semantic Web-based Data Exchange and
Interoperability for OEM-Supplier Collaboration
(Pratt and Whitney) (2014-2015)
• KDDM: Federated Semantic Services Platform
for Open Materials Science and Engineering
(AFRL) (2013-2016)
• Computer Assisted Document Interpretation
Tools. (NSF SBIR Phases I and II with Cohesia
Corp.) (1999-2002)
• Document => Materials and Process Specs (alloys)
2
Selected URLs and Publications
• http://www.knoesis.org/?q=research/semMat
• http://wiki.knoesis.org/index.php/MaterialWays
• Nishita Jaykumar, PavanKalyan Yallamelli, Vinh Nguyen, Sarasi
Lalithsena, Krishnaprasad Thirunarayan, Amit Sheth. KnowledgeWiki:
An OpenSource Tool for Creating Community-Curated Vocabulary,
with a Use Case in Materials Science. In LDOW - WWW 2016.
Montreal, Canada; 2016.
• Vinh Nguyen, Olivier Bodenreider, Amit Sheth. Don't like RDF
Reification? Making Statements about Statements using Singleton
Property. 23rd International conference on World Wide Web (WWW
2014). NY: ACM; 2014. p. 759-770.
• Krishnaprasad Thirunarayan, Amit Sheth, Kalpa Gunaratna, Vinh
Nguyen, Siva Cheekula, Sarasi Lalithsena, Nishita Jaykumar, Swapnil
Soni, Clare Paul. Architecture and Prototype for Materials Knowledge
Management System using Semantic Web Technologies and
Techniques: A Preliminary Report. WSU, 2014
3
Selected URLs and Publications
• Krishnaprasad Thirunarayan, On Embedding Machine-
Processable Semantics into Documents, In: IEEE Transactions
on Knowledge and Data Engineering, Vol. 17, No. 7, pp. 1014-
1018, July 2005.
• K. Thirunarayan, A. Berkovich, and D. Sokol, An Information
Extraction Approach to Reorganizing and Summarizing
Specifications, In: Information and Software Technology
Journal, Vol. 47, Issue 4, pp. 215-232, 2005.
• K. Thirunarayan, A. Berkovich, and D. Sokol, Semi-automatic
Content Extraction from Specifications, In: Proceedings of 6th
International Conference on Applications of Natural Language
to Information Systems, LNCS 2553, pp. 40-51, June 2002.
4
Outline
• Domain Goals and Challenges
• Utility and Continuum of Machine-Processable Semantics : An
Architecture
• What?: Nature of Data and Granurality of Semantics
• Why?: Lightweight semantics and its benefits
• How?: Community-ratified Ontologies
+ Semantic Annotations of Data and Documents
+ Linked Open Materials Data
• Applications:
• (Skip) Long-term Research: Processing Tabular Data
• Integrating vocabularies : Matvocab KnowledgeWiki use case
• Document Annotation : Biomaterials use case
• Visualization and Navigation : iExplore
• Private-Public Data Sharing
• Conclusion
5
Domain Goals and Challenges
• Materials Science and Engineering Data and
Document sharing, discovery, and application are
possible only if domain scientists are able and
willing to do so.
• Technological challenges
– Computational tools and repositories conducive to easy
exchange, curation, attribution, and analysis of data
• Cultural challenges
– Proper protection, control, and credit for sharing data
6
Our Thesis / Value Proposition
Associating machine-processable semantics
with materials science and engineering data
and documents can help overcome
challenges associated with data discovery,
integration and interoperability caused by
data heterogeneity.
7
What?: Nature of Data
• Structured Data (e.g., relational)
• Semi-structured, Heterogeneous Documents
(e.g., publications and technical specs which
usually include text, numerics, units of measure,
images and equations)
• Tabular data (e.g., ad hoc spreadsheets and
complex tables incorporating “irregular” entries)
8
9
Fragment of Materials and Process spec for:
Ti Alloy Bars, Wire, Forgings, and Rings.
What?: Granularity of Semantics and Applications: Examples
• Synonyms
– Chemistry, Chemical Composition, Chemical Analysis, ...
– Bend Test, Bending, ...
– Delivery Condition, Process/Surface Finish, Temper, "as received by
purchaser", ...
• Co-reference vs broadening/narrowing
– Tubing vs welded tubing vs flash-welded part
• Capturing characteristic-value pairs
– Recognize and Normalize: “0.1 inch and under in nominal thickness”
is translated to “Thickness <= 0.1 in”.
– Glean elided characteristic: controlled term “solution heat treated”
implies the attribute “heat treat type”.
10
1
2
3
of
Semantic Web
1
• Ontology: Agreement about a common
vocabulary/nomenclature, conceptual models and
domain knowledge
– Codified as Schema + Knowledge Base.
– Agreement is what enables interoperability.
– Formal machine processable description is what
leads to automation.
2
• Semantic Annotation (Metadata Extraction):
Associating meaning with data, or labeling data so
it is more meaningful to the system and people.
– Manual
– Semi-automatic (automatic with human
verification)
– Automatic
3
• Reasoning/Computation:
– Semantics enabled search
– Data integration
– Answering complex queries and making connections
(paths, sub-graphs)
– Analyses including pattern discovery, mining, hypothesis
validation
– Visualization
How to integrate well? From Syntax to Semantics
15
SSN
Ontology
2 Interpreted data
(deductive)
[in OWL]
e.g., threshold
1 Annotated Data
[in RDF]
e.g., label
0 Raw Data
[in TEXT]
e.g., number
Using Semantics to Climb Levels of Abstraction: an example
3 Interpreted data
(abductive)
[in OWL]
e.g., diagnosis
Intellego
“150”
Systolic blood pressure of 150 mmHg
Elevated
Blood
Pressure
Hyperthyroidism
……
16
Semantic Web Data
Subject
Object
Predicate
A triple is in the format (Subject, Predicate, Object).
An RDF Dataset is a set of triples.
What?: Granularity of Semantics and Associated Applications
• Lightweight semantics: File and document-level
annotation to enable discovery and sharing
• Richer semantics: Data-level annotation and
extraction for semantic search and summarization
• Fine-grained semantics: Data integration,
interoperability and reasoning in Linked Open
Materials Science Data
18
Computer Assisted Document Extraction Tool
Tree/Structure view of the SpecTypical view of the tagged Spec
Computer Assisted Document Extraction Tool
Example: Procedure Melt Methods
View of the Original Spec Tagged Spec
Tag
Editor
Computer Assisted Document Extraction Tool
The SDL
Few More Examples: Procedure Melt MethodsTag
Editor
Why?: Benefits of Lightweight Semantics
• Ease of use by domain experts
– Faster and wider adoption, promoting evolution
• Low upfront cost to support
• Shallow semantics has wider applicability to a
range of documents/data and appeal to a broader
community
• Bottom-line: “Learn to Walk before we Run”
22
How?: Using Semantic Web Technologies
Machine-processable semantics achieved by
addressing
• Syntactic Heterogeneity: Using XML syntax and
RDF datamodel (labelled graph structure)
• Semantic Heterogeneity:
– Using “common” controlled vocabularies, taxonomies
and ontologies
– Using federated data sources, exchanges, querying,
and services
23
How?: Ingredients for Semantics-based Cyber Infrastructure
• Use of community-ratified controlled vocabularies
and lightweight ontologies (upper-level,
hierarchies)
• Ease registration, publishing, and discovery
• Provide support for provenance and access control
• Track data citation for credit for data sharing
• Semi-automatic annotation of data and documents
: Manual + Automatic
24
How?: Search Continuum
• Keyword-based full-text search
• + Manually provided content and source metadata
• Uses upper-level ontology
• + Automatically extracted metadata
• Map text to concepts/properties/values
• Semantic + faceted search using background knowledge
• + Deeper semi-automatic content annotation and
extraction
• Aggregating related pieces of information; conditioning
• Integration and Interoperation
• + Linked Open Material Science Data
• + Federated and Faceted Querying and Services
25
Linked Open Data
• Use “URIs” as identifiers to describe things
http://dbpedia.org/resource/John_F._Kennedy
• Associate descriptions to the identifiers
26
db:John_F.
_Kennedy
db:Politician
db:Profession
Linked Open Data
• Connect things together
27
db:John_F.
_Kennedy
db:Politician
db:Profession
ex:John_K
ennedy
ex:A_Nation
_of_Immigra
nts
ex:authored_book
owl:sameAs
Linked Open Data
28
Title of data Selected from five tier vocabulary
provided Keywords
Type of data maps, excel files, images, text
Data format structured or unstructured
Description of data brief unstructured description of content
Contact information of provider(s) name of provider(s), email for verification,
lineage
Spatial extent of data and
reference system
location
Temporal extent of data date range in time or age range if not recent
Date and type of Related
Publication(s)
Journal, Thesis, Agency report, not published
Host site for publication Journal, Library, Personal computer
Access restrictions copyright regulations
Example: Lightweight Semantic Registration of Data
29
System Architecture and Components
30
Problems and A Practical Approach
(“When rubber meets the road”)
Deeper Issues: Semantic Formalization
of Tabular Data
31
skip
Nature of tables
• Compact structures for sharing information
– Minimize duplication
• Types of Tables
– Regular : Dense Grid with explicit schema
information in terms of column and row
headings => Tractable
– Irregular: Sparse Grid with implicit schema and
ad hoc placement of heading => Hard
32
33
Challenges Associated with Typical Spreadsheet/Table
• Meant for human consumption
• Irregular :
– Not simple rectangular grid
• Heterogeneous
– All rows not interpreted similarly
• Complex
– Meaning of each row and each column context
dependent
• Footnotes modify meaning of entries (esp. in materials
and process specifications)
34
Practical Semi-Automatic Content Extraction
• DESIGN: Develop regular data structures that
can be used to formalize tabular information.
– Provide a natural expression of data
– Provide semantics to data, thereby removing potential
ambiguities
– Enable automatic translation
• USE: Manual population of regular tables and
automatic translation into LOD
35
36
Our applications in
Materials Genome Initiative
Matvocab home page
Search and discovery
Annotate documents
Visualize the
knowledge base
Query vocabulary
View, edit, and add
Create and process
assertions
38
Vocabulary Creation / Curation
N. Jaykumar, P. Yallamelli, V. Nguyen, S. Lalithsena,
K. Thirunarayan, A. Sheth, C. Paul:
KnowledgeWiki: An OpenSource Tool for Creating Community
Curated Vocabulary, with a Use Case in Materials Science
(Linked Data on the Web, World Wide Web Conference 2016)
KnowledgeWiki: An OpenSource Tool for Creating
Community-Curated Vocabulary, with a Use Case in
Materials Science
WWW - LDOW 2016, Canada
Nishita Jaykumar, Pavankalyan Yallamelli, Vinh Nguyen,
Sarasi Lalithsena, Krishnaprasad Thirunarayan, Amit Sheth
Kno.e.sis, Wright State University
Clare Paul
*Air Force Research Laboratory, Wright-Patterson AFB
40
• Collaboration with AFRL
Context for Research
ASM
HNDBK
MIL
HNDBK-5
MIL
HNDBK-17
(Standardized
Vocabularies)
SKOS
Dublin Core
QUDT
VAEM
…
Crowdsourcing from
domain experts
Consolidated
vocabulary
(MatVocab)
41
Motivating Example
Facts:
Name Definition Source
A-Basis The mechanical property value is
the value above which …
ASM Handbook, Volume 21:
Composites.
ABasis A statistically-based material
property; a 95% lower…
Composite Materials Handbook -
Volume 1.
MIL-HDBK-17F-1F, 17 June 2002
A-Basis The lower of either a statistically
calculated number…
Metallic Materials and Elements for
Aerospace Vehicle Structures, MIL-
HDBK-5J, 31 January 2003
42
Facts:
Name Definition Source
YoungsModulus The ratio of normal stress to
corresponding …
ASM Handbook, Volume
21: Composites.
ModulusYoungs The ratio of change in stress to
change …
MIL-HDBK-17
• Same term has multiple definitions that needs to be
represented with its provenance information, that
includes data such as, source and time.
Motivating Example
43
Related Work
Auxiliary node
approach
A-Basis
Auxiliary
node1
…
A statistically-based
material …
P26v
P26s
P580q
P582q
…
• Properties represented in the wikidata model do not
correspond to RDF properties
• Ad hoc: Lack of formal semantics
• Extension to Mediawiki
• We use the Semantic Form extension of Semantic
Mediawiki for our task
• Inability to represent metadata about the metadata
44
Semantic Mediawiki
http://www.slideshare.net/cool_uk/semantic-mediawiki-simple-tutorial
Representing entities and
simple metadata
The '''United Kingdom''' is a
country located in
[[Located in::Europe]].
45
46
• Adopted the Singleton Property method for capturing
triple metadata in SMW
• Importing legacy data with provenance in bulk using
the Singleton Property method
• Importing existing RDF datasets with provenance into
SMW for curation
Our Approach
Subject Predicate Object Source License
Autoclave hasDefinition “A closed vessel for
producing…”
MIL-HDBK-17F-1F,
17
All rights reserved
Singleton Property
Facts:
Subject Predicate Object
hasDefinition#1 rdf:sp hasDefinition
Autoclave hasDefinition#1 “A closed vessel for producing…”
hasDefinition#1 hasSource MIL-HDBK-17
hasDefinition#1 hasLicense All rights reserved
Singleton Property Translation
47
A singleton property represents one specific relationship between two entities under
a certain context. It is assigned a uri, as any other property, and can be considered as
a subproperty or an instance of a generic property.
"Don't like RDF reification?: making statements about statements using singleton property."Proceedings of the 23rd international
conference on World wide web. ACM, 2014.
• Formal semantics defined
• Scalable, e.g., to LOD
• Compatible with existing standards
– RDF, RDFS, SPARQL
• Can be used to capture multiple types of metadata
– Provenance, time, location
48
Why use Singleton Property?
Fu, Gang, et al. "Exposing Provenance Metadata Using Different RDF Models." arXiv preprint arXiv:1509.02822 (2015). Nguyen, Vinh, Olivier Bodenreider, and Amit Sheth.
Hernández, Daniel, Aidan Hogan, and Markus Krötzsch. "Reifying RDF: What Works Well With Wikidata?." Proceedings of the 11th International Workshop on Scalable
Semantic Web Knowledge Base Systems co-located with 14th International Semantic Web Conference (ISWC 2015), Bethlehem, PA, USA. 2015.
49
Singleton v/s Regular Template
Autoclave
Definition Text
Image
Source
Rights
Autoclave
Definition Text
Image
Source
Rights
Source
Rights
50
Regular Vs Singleton templates
Subject Predicate Object
Autoclave hasDefinition#1 “A closed vessel…”
hasDefinition#1 singletonPropertyOf skos:definition
hasDefinition#1 source “ASM Handbook”
hasDefinition#1 license “Reproduced by…”
Autoclave hasImage#1 “Image.jpg”
hasImage#1 singletonPropertyOf mv:image
Subject Predicate Object
Autoclave hasDefinition “A closed vessel…”
Autoclave source “ASM Handbook”
Autoclave license “Reproduced by…”
Autoclave hasImage “Image.jpg”
51
Overall Architecture
• Properties of interest to domain experts:
– Definition Text
– Source
– License
– Creator
– Abbreviation
– Synonyms
– Units
– …..
52
Use Case in Materials Science
mv: is matvocab namespace
53
Statistics of the Vocabulary Import Use Case
Type SMW
1 Number of vocabularies imported 3
2 Total number of terms imported from ASM 1295
3 Total number of terms imported from MILHNDBK-5 19
4 Total number of terms imported from MILHNDBK-17 179
5 Total number of Singleton Templates created 6
6 Total number of Regular Templates created 5
7 Total number of pages created 1,685
54
Search & Discovery
Annotate, search, and track provenance
• Vocabulary is used to annotate documents.
• Annotated documents can be indexed.
• Documents can be integrated reliably based
on common terms of interest and
provenance information.
55
56
Annotate documents using standard vocabulary
• Explains the origin of an artifact, such as
– How was it created?
– Who created it?
– When was it created?
• Example: for a given material X
– Which processes are involved in making the material and
what are the relevant performance properties?
– What are the inputs, control parameters and outputs of a
process?
– Which research/engineering team performed an
experiment?
Provenance Metadata
58
Capturing and Exploring provenance metadata - iExplore
generic PMC prepreg
generic hand lay-up
generic PMC lay-up
generic autoclave cure
generic PMC
subjected to
subjected to
yields
yields
59
Capturing and Exploring Vocabulary Provenance -
iExplore
Definition
Rights
Source
Vocabulary term
Biomaterials Knowledge Extraction :
Protein/Peptides/Amino Acids-Precious Metal Bindings
• Recognition and extraction of crystalline surface
patterns for precious metals (e.g., Gold/Silver
surface patterns via Miller Indices - Au(100),
Au(110), Ag(111)), protein/peptide/amino acid
sequences, and indicators of binding relationship.
– Example Input: They found that an alanine-substituted
peptide (AYSSGAPPAPPF) exhibited the highest
affinity for gold, while a proline-substituted peptide
(AYPPGAPPMPPF) showed almost no affinity.
60
61
Semantic Web Based
Data Exchange and Interoperability
for OEM-Supplier Collaboration
62
Goal and Example Accomplishment
• Implement a Collaboration Platform using Semantic
Web technology in the backend.
– Semantic Web representation (RDF) and querying
(SPARQL) hidden from the users (domain scientists) for
convenience.
• Example functionality incorporated in the “Beta”
version of the PW-11 Collaboration Platform
– Creation of a project by its owner and assigning users to
groups (e.g., ordinary, external, foreign) in a project
– Assigning access control rights based on group/user/file
– Searching, requesting, and uploading files respecting
access restrictions
63
Overall Plan
• Implement necessary user interfaces and backend
processing to facilitate the Collaboration use cases.
– Develop and document user interfaces to support flexible
access control and data exchange
– Store information as metadata in the form of triples to
support light-weight reasoning
• Virtuoso triple store
– Upload and store files (in the server’s file system)
respecting user-project access control restrictions
• Ubuntu, Java VM, Apache Tomcat Web Server
64
Pre-requisites
• Pre-populated set of authorized users (for
authentication)
– Realistically this will require significant scrutiny of a user
outside the collaboration platform.
• Simple access control architecture and mechanisms
(that can be extended further based on user feedback).
• Kno.e.sis prototype assumed availability of an ITAR
certified container to host the collaboration platform.
Thus, the development of additional infrastructure for
ITAR compliance was out of scope.
65
Public-Private Data Sharing
• Enhance publicly available datasets while
retaining intellectual property data privately for
businesses
66
Private data and metadata
(e.g. ongoing experimental processes, intellectual property data)
Selectively shared data and metadata
(e.g. with ongoing collaborators, licensed data)
Public data and metadata
(e.g., released products, material specifications)
OEM partner A
Federated Architecture
67
Private
Shared
Public
Federal Endpoint
1. User
Authentication
2. Federated Semantic
Query Processor
AC
Processor
Semantic
Query
Processor
OEM partner B
Private
Shared
Public
AC
Processor
Semantic
Query
Processor
OEM supplier C
Private
Shared
Public
AC
Processor
Semantic
Query
Processor
3. Semantics
Mappings
Principles of a Federation
• Each component controls access to its local data
independently (local autonomy).
• A query is decomposed to multiple sub-queries,
each sub-query is executed at one component.
• Results from sub-queries are combined by the
federated query processor (control global access)
Kno.e.sis Tools
• Doozer: Ontology creator from Wikipedia
category hierarchy
• Scooner: Tool for trailblazing using semantic
triples
• Kino: Faceted Search Engine
• iExplore: Visualize and navigate semantic /
linked data
• BLOOMS: Ontology alignment tool
69
Take Away
Use of semantic web technologies
can help overcome challenges associated with
data discovery, integration, and interoperability,
caused by data heterogeneity, and
use of provenance and access control
can help to share/exchange data reliably.
70
71
thank you, and please visit us at
http://knoesis.org/
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, Ohio, USA
Kno.e.sis

Mais conteúdo relacionado

Mais procurados

Search term recommendation and non-textual ranking evaluated
 Search term recommendation and non-textual ranking evaluated Search term recommendation and non-textual ranking evaluated
Search term recommendation and non-textual ranking evaluated
GESIS
 
Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)
Bradley Allen
 
Oop principles a good book
Oop principles a good bookOop principles a good book
Oop principles a good book
lahorisher
 
Semantic Search Tutorial at SemTech 2012
Semantic Search Tutorial at SemTech 2012 Semantic Search Tutorial at SemTech 2012
Semantic Search Tutorial at SemTech 2012
Thanh Tran
 
Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)
Bradley Allen
 

Mais procurados (20)

Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
Ngdm09 han gao
Ngdm09 han gaoNgdm09 han gao
Ngdm09 han gao
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
 
Information Retrieval Fundamentals - An introduction
Information Retrieval Fundamentals - An introduction Information Retrieval Fundamentals - An introduction
Information Retrieval Fundamentals - An introduction
 
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives TaiwanA Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
 
Metadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation beginsMetadata Quality Assurance Part II. The implementation begins
Metadata Quality Assurance Part II. The implementation begins
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sources
 
Search term recommendation and non-textual ranking evaluated
 Search term recommendation and non-textual ranking evaluated Search term recommendation and non-textual ranking evaluated
Search term recommendation and non-textual ranking evaluated
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?
 
Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)
 
Linked Data Competency Index : Mapping the field for teachers and learners
 Linked Data Competency Index : Mapping the field for teachers and learners Linked Data Competency Index : Mapping the field for teachers and learners
Linked Data Competency Index : Mapping the field for teachers and learners
 
thesis defense1
thesis defense1thesis defense1
thesis defense1
 
DRI Introductory Training: Introduction to Metadata
DRI Introductory Training: Introduction to MetadataDRI Introductory Training: Introduction to Metadata
DRI Introductory Training: Introduction to Metadata
 
Preparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR PrinciplesPreparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR Principles
 
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
 
Oop principles a good book
Oop principles a good bookOop principles a good book
Oop principles a good book
 
Semantic Search Tutorial at SemTech 2012
Semantic Search Tutorial at SemTech 2012 Semantic Search Tutorial at SemTech 2012
Semantic Search Tutorial at SemTech 2012
 
Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)Faceted Navigation (LACASIS Fall Workshop 2005)
Faceted Navigation (LACASIS Fall Workshop 2005)
 

Semelhante a Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analytics, and Applications

Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
Artificial Intelligence Institute at UofSC
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
SEAD
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
Peter Haase
 

Semelhante a Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analytics, and Applications (20)

Realizing Semantic Web - Light Weight semantics and beyond
Realizing Semantic Web - Light Weight semantics and beyondRealizing Semantic Web - Light Weight semantics and beyond
Realizing Semantic Web - Light Weight semantics and beyond
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
 
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and ApplicationsSemantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
 
Text Mining
Text MiningText Mining
Text Mining
 
The Future of Semantics on the Web
The Future of Semantics on the WebThe Future of Semantics on the Web
The Future of Semantics on the Web
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
 
L07 metadata
L07 metadataL07 metadata
L07 metadata
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impact
 
RDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneRDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOne
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
 
V.3 poster current citations and a future with linked data
V.3 poster current citations and a future with linked dataV.3 poster current citations and a future with linked data
V.3 poster current citations and a future with linked data
 
DataONE Education Module 07: Metadata
DataONE Education Module 07: MetadataDataONE Education Module 07: Metadata
DataONE Education Module 07: Metadata
 
Metadata: Digital Humanties
Metadata: Digital HumantiesMetadata: Digital Humanties
Metadata: Digital Humanties
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
 
C N I20080404
C N I20080404C N I20080404
C N I20080404
 

Último

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
gajnagarg
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 

Último (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 

Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analytics, and Applications

  • 1. Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analytics, and Applications Krishnaprasad Thirunarayan (T. K. Prasad) and Amit Sheth Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing 1
  • 2. Relevant Funded Projects : A Brush with Pain Points and Promise • Semantic Web-based Data Exchange and Interoperability for OEM-Supplier Collaboration (Pratt and Whitney) (2014-2015) • KDDM: Federated Semantic Services Platform for Open Materials Science and Engineering (AFRL) (2013-2016) • Computer Assisted Document Interpretation Tools. (NSF SBIR Phases I and II with Cohesia Corp.) (1999-2002) • Document => Materials and Process Specs (alloys) 2
  • 3. Selected URLs and Publications • http://www.knoesis.org/?q=research/semMat • http://wiki.knoesis.org/index.php/MaterialWays • Nishita Jaykumar, PavanKalyan Yallamelli, Vinh Nguyen, Sarasi Lalithsena, Krishnaprasad Thirunarayan, Amit Sheth. KnowledgeWiki: An OpenSource Tool for Creating Community-Curated Vocabulary, with a Use Case in Materials Science. In LDOW - WWW 2016. Montreal, Canada; 2016. • Vinh Nguyen, Olivier Bodenreider, Amit Sheth. Don't like RDF Reification? Making Statements about Statements using Singleton Property. 23rd International conference on World Wide Web (WWW 2014). NY: ACM; 2014. p. 759-770. • Krishnaprasad Thirunarayan, Amit Sheth, Kalpa Gunaratna, Vinh Nguyen, Siva Cheekula, Sarasi Lalithsena, Nishita Jaykumar, Swapnil Soni, Clare Paul. Architecture and Prototype for Materials Knowledge Management System using Semantic Web Technologies and Techniques: A Preliminary Report. WSU, 2014 3
  • 4. Selected URLs and Publications • Krishnaprasad Thirunarayan, On Embedding Machine- Processable Semantics into Documents, In: IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 7, pp. 1014- 1018, July 2005. • K. Thirunarayan, A. Berkovich, and D. Sokol, An Information Extraction Approach to Reorganizing and Summarizing Specifications, In: Information and Software Technology Journal, Vol. 47, Issue 4, pp. 215-232, 2005. • K. Thirunarayan, A. Berkovich, and D. Sokol, Semi-automatic Content Extraction from Specifications, In: Proceedings of 6th International Conference on Applications of Natural Language to Information Systems, LNCS 2553, pp. 40-51, June 2002. 4
  • 5. Outline • Domain Goals and Challenges • Utility and Continuum of Machine-Processable Semantics : An Architecture • What?: Nature of Data and Granurality of Semantics • Why?: Lightweight semantics and its benefits • How?: Community-ratified Ontologies + Semantic Annotations of Data and Documents + Linked Open Materials Data • Applications: • (Skip) Long-term Research: Processing Tabular Data • Integrating vocabularies : Matvocab KnowledgeWiki use case • Document Annotation : Biomaterials use case • Visualization and Navigation : iExplore • Private-Public Data Sharing • Conclusion 5
  • 6. Domain Goals and Challenges • Materials Science and Engineering Data and Document sharing, discovery, and application are possible only if domain scientists are able and willing to do so. • Technological challenges – Computational tools and repositories conducive to easy exchange, curation, attribution, and analysis of data • Cultural challenges – Proper protection, control, and credit for sharing data 6
  • 7. Our Thesis / Value Proposition Associating machine-processable semantics with materials science and engineering data and documents can help overcome challenges associated with data discovery, integration and interoperability caused by data heterogeneity. 7
  • 8. What?: Nature of Data • Structured Data (e.g., relational) • Semi-structured, Heterogeneous Documents (e.g., publications and technical specs which usually include text, numerics, units of measure, images and equations) • Tabular data (e.g., ad hoc spreadsheets and complex tables incorporating “irregular” entries) 8
  • 9. 9 Fragment of Materials and Process spec for: Ti Alloy Bars, Wire, Forgings, and Rings.
  • 10. What?: Granularity of Semantics and Applications: Examples • Synonyms – Chemistry, Chemical Composition, Chemical Analysis, ... – Bend Test, Bending, ... – Delivery Condition, Process/Surface Finish, Temper, "as received by purchaser", ... • Co-reference vs broadening/narrowing – Tubing vs welded tubing vs flash-welded part • Capturing characteristic-value pairs – Recognize and Normalize: “0.1 inch and under in nominal thickness” is translated to “Thickness <= 0.1 in”. – Glean elided characteristic: controlled term “solution heat treated” implies the attribute “heat treat type”. 10
  • 12. 1 • Ontology: Agreement about a common vocabulary/nomenclature, conceptual models and domain knowledge – Codified as Schema + Knowledge Base. – Agreement is what enables interoperability. – Formal machine processable description is what leads to automation.
  • 13. 2 • Semantic Annotation (Metadata Extraction): Associating meaning with data, or labeling data so it is more meaningful to the system and people. – Manual – Semi-automatic (automatic with human verification) – Automatic
  • 14. 3 • Reasoning/Computation: – Semantics enabled search – Data integration – Answering complex queries and making connections (paths, sub-graphs) – Analyses including pattern discovery, mining, hypothesis validation – Visualization
  • 15. How to integrate well? From Syntax to Semantics 15
  • 16. SSN Ontology 2 Interpreted data (deductive) [in OWL] e.g., threshold 1 Annotated Data [in RDF] e.g., label 0 Raw Data [in TEXT] e.g., number Using Semantics to Climb Levels of Abstraction: an example 3 Interpreted data (abductive) [in OWL] e.g., diagnosis Intellego “150” Systolic blood pressure of 150 mmHg Elevated Blood Pressure Hyperthyroidism …… 16
  • 17. Semantic Web Data Subject Object Predicate A triple is in the format (Subject, Predicate, Object). An RDF Dataset is a set of triples.
  • 18. What?: Granularity of Semantics and Associated Applications • Lightweight semantics: File and document-level annotation to enable discovery and sharing • Richer semantics: Data-level annotation and extraction for semantic search and summarization • Fine-grained semantics: Data integration, interoperability and reasoning in Linked Open Materials Science Data 18
  • 19. Computer Assisted Document Extraction Tool Tree/Structure view of the SpecTypical view of the tagged Spec
  • 20. Computer Assisted Document Extraction Tool Example: Procedure Melt Methods View of the Original Spec Tagged Spec Tag Editor
  • 21. Computer Assisted Document Extraction Tool The SDL Few More Examples: Procedure Melt MethodsTag Editor
  • 22. Why?: Benefits of Lightweight Semantics • Ease of use by domain experts – Faster and wider adoption, promoting evolution • Low upfront cost to support • Shallow semantics has wider applicability to a range of documents/data and appeal to a broader community • Bottom-line: “Learn to Walk before we Run” 22
  • 23. How?: Using Semantic Web Technologies Machine-processable semantics achieved by addressing • Syntactic Heterogeneity: Using XML syntax and RDF datamodel (labelled graph structure) • Semantic Heterogeneity: – Using “common” controlled vocabularies, taxonomies and ontologies – Using federated data sources, exchanges, querying, and services 23
  • 24. How?: Ingredients for Semantics-based Cyber Infrastructure • Use of community-ratified controlled vocabularies and lightweight ontologies (upper-level, hierarchies) • Ease registration, publishing, and discovery • Provide support for provenance and access control • Track data citation for credit for data sharing • Semi-automatic annotation of data and documents : Manual + Automatic 24
  • 25. How?: Search Continuum • Keyword-based full-text search • + Manually provided content and source metadata • Uses upper-level ontology • + Automatically extracted metadata • Map text to concepts/properties/values • Semantic + faceted search using background knowledge • + Deeper semi-automatic content annotation and extraction • Aggregating related pieces of information; conditioning • Integration and Interoperation • + Linked Open Material Science Data • + Federated and Faceted Querying and Services 25
  • 26. Linked Open Data • Use “URIs” as identifiers to describe things http://dbpedia.org/resource/John_F._Kennedy • Associate descriptions to the identifiers 26 db:John_F. _Kennedy db:Politician db:Profession
  • 27. Linked Open Data • Connect things together 27 db:John_F. _Kennedy db:Politician db:Profession ex:John_K ennedy ex:A_Nation _of_Immigra nts ex:authored_book owl:sameAs
  • 29. Title of data Selected from five tier vocabulary provided Keywords Type of data maps, excel files, images, text Data format structured or unstructured Description of data brief unstructured description of content Contact information of provider(s) name of provider(s), email for verification, lineage Spatial extent of data and reference system location Temporal extent of data date range in time or age range if not recent Date and type of Related Publication(s) Journal, Thesis, Agency report, not published Host site for publication Journal, Library, Personal computer Access restrictions copyright regulations Example: Lightweight Semantic Registration of Data 29
  • 30. System Architecture and Components 30
  • 31. Problems and A Practical Approach (“When rubber meets the road”) Deeper Issues: Semantic Formalization of Tabular Data 31 skip
  • 32. Nature of tables • Compact structures for sharing information – Minimize duplication • Types of Tables – Regular : Dense Grid with explicit schema information in terms of column and row headings => Tractable – Irregular: Sparse Grid with implicit schema and ad hoc placement of heading => Hard 32
  • 33. 33
  • 34. Challenges Associated with Typical Spreadsheet/Table • Meant for human consumption • Irregular : – Not simple rectangular grid • Heterogeneous – All rows not interpreted similarly • Complex – Meaning of each row and each column context dependent • Footnotes modify meaning of entries (esp. in materials and process specifications) 34
  • 35. Practical Semi-Automatic Content Extraction • DESIGN: Develop regular data structures that can be used to formalize tabular information. – Provide a natural expression of data – Provide semantics to data, thereby removing potential ambiguities – Enable automatic translation • USE: Manual population of regular tables and automatic translation into LOD 35
  • 36. 36 Our applications in Materials Genome Initiative
  • 37. Matvocab home page Search and discovery Annotate documents Visualize the knowledge base Query vocabulary View, edit, and add Create and process assertions
  • 38. 38 Vocabulary Creation / Curation N. Jaykumar, P. Yallamelli, V. Nguyen, S. Lalithsena, K. Thirunarayan, A. Sheth, C. Paul: KnowledgeWiki: An OpenSource Tool for Creating Community Curated Vocabulary, with a Use Case in Materials Science (Linked Data on the Web, World Wide Web Conference 2016)
  • 39. KnowledgeWiki: An OpenSource Tool for Creating Community-Curated Vocabulary, with a Use Case in Materials Science WWW - LDOW 2016, Canada Nishita Jaykumar, Pavankalyan Yallamelli, Vinh Nguyen, Sarasi Lalithsena, Krishnaprasad Thirunarayan, Amit Sheth Kno.e.sis, Wright State University Clare Paul *Air Force Research Laboratory, Wright-Patterson AFB
  • 40. 40 • Collaboration with AFRL Context for Research ASM HNDBK MIL HNDBK-5 MIL HNDBK-17 (Standardized Vocabularies) SKOS Dublin Core QUDT VAEM … Crowdsourcing from domain experts Consolidated vocabulary (MatVocab)
  • 41. 41 Motivating Example Facts: Name Definition Source A-Basis The mechanical property value is the value above which … ASM Handbook, Volume 21: Composites. ABasis A statistically-based material property; a 95% lower… Composite Materials Handbook - Volume 1. MIL-HDBK-17F-1F, 17 June 2002 A-Basis The lower of either a statistically calculated number… Metallic Materials and Elements for Aerospace Vehicle Structures, MIL- HDBK-5J, 31 January 2003
  • 42. 42 Facts: Name Definition Source YoungsModulus The ratio of normal stress to corresponding … ASM Handbook, Volume 21: Composites. ModulusYoungs The ratio of change in stress to change … MIL-HDBK-17 • Same term has multiple definitions that needs to be represented with its provenance information, that includes data such as, source and time. Motivating Example
  • 43. 43 Related Work Auxiliary node approach A-Basis Auxiliary node1 … A statistically-based material … P26v P26s P580q P582q … • Properties represented in the wikidata model do not correspond to RDF properties • Ad hoc: Lack of formal semantics
  • 44. • Extension to Mediawiki • We use the Semantic Form extension of Semantic Mediawiki for our task • Inability to represent metadata about the metadata 44 Semantic Mediawiki http://www.slideshare.net/cool_uk/semantic-mediawiki-simple-tutorial Representing entities and simple metadata The '''United Kingdom''' is a country located in [[Located in::Europe]].
  • 45. 45
  • 46. 46 • Adopted the Singleton Property method for capturing triple metadata in SMW • Importing legacy data with provenance in bulk using the Singleton Property method • Importing existing RDF datasets with provenance into SMW for curation Our Approach
  • 47. Subject Predicate Object Source License Autoclave hasDefinition “A closed vessel for producing…” MIL-HDBK-17F-1F, 17 All rights reserved Singleton Property Facts: Subject Predicate Object hasDefinition#1 rdf:sp hasDefinition Autoclave hasDefinition#1 “A closed vessel for producing…” hasDefinition#1 hasSource MIL-HDBK-17 hasDefinition#1 hasLicense All rights reserved Singleton Property Translation 47 A singleton property represents one specific relationship between two entities under a certain context. It is assigned a uri, as any other property, and can be considered as a subproperty or an instance of a generic property. "Don't like RDF reification?: making statements about statements using singleton property."Proceedings of the 23rd international conference on World wide web. ACM, 2014.
  • 48. • Formal semantics defined • Scalable, e.g., to LOD • Compatible with existing standards – RDF, RDFS, SPARQL • Can be used to capture multiple types of metadata – Provenance, time, location 48 Why use Singleton Property? Fu, Gang, et al. "Exposing Provenance Metadata Using Different RDF Models." arXiv preprint arXiv:1509.02822 (2015). Nguyen, Vinh, Olivier Bodenreider, and Amit Sheth. Hernández, Daniel, Aidan Hogan, and Markus Krötzsch. "Reifying RDF: What Works Well With Wikidata?." Proceedings of the 11th International Workshop on Scalable Semantic Web Knowledge Base Systems co-located with 14th International Semantic Web Conference (ISWC 2015), Bethlehem, PA, USA. 2015.
  • 49. 49 Singleton v/s Regular Template Autoclave Definition Text Image Source Rights Autoclave Definition Text Image Source Rights Source Rights
  • 50. 50 Regular Vs Singleton templates Subject Predicate Object Autoclave hasDefinition#1 “A closed vessel…” hasDefinition#1 singletonPropertyOf skos:definition hasDefinition#1 source “ASM Handbook” hasDefinition#1 license “Reproduced by…” Autoclave hasImage#1 “Image.jpg” hasImage#1 singletonPropertyOf mv:image Subject Predicate Object Autoclave hasDefinition “A closed vessel…” Autoclave source “ASM Handbook” Autoclave license “Reproduced by…” Autoclave hasImage “Image.jpg”
  • 52. • Properties of interest to domain experts: – Definition Text – Source – License – Creator – Abbreviation – Synonyms – Units – ….. 52 Use Case in Materials Science mv: is matvocab namespace
  • 53. 53 Statistics of the Vocabulary Import Use Case Type SMW 1 Number of vocabularies imported 3 2 Total number of terms imported from ASM 1295 3 Total number of terms imported from MILHNDBK-5 19 4 Total number of terms imported from MILHNDBK-17 179 5 Total number of Singleton Templates created 6 6 Total number of Regular Templates created 5 7 Total number of pages created 1,685
  • 55. Annotate, search, and track provenance • Vocabulary is used to annotate documents. • Annotated documents can be indexed. • Documents can be integrated reliably based on common terms of interest and provenance information. 55
  • 56. 56 Annotate documents using standard vocabulary
  • 57. • Explains the origin of an artifact, such as – How was it created? – Who created it? – When was it created? • Example: for a given material X – Which processes are involved in making the material and what are the relevant performance properties? – What are the inputs, control parameters and outputs of a process? – Which research/engineering team performed an experiment? Provenance Metadata
  • 58. 58 Capturing and Exploring provenance metadata - iExplore generic PMC prepreg generic hand lay-up generic PMC lay-up generic autoclave cure generic PMC subjected to subjected to yields yields
  • 59. 59 Capturing and Exploring Vocabulary Provenance - iExplore Definition Rights Source Vocabulary term
  • 60. Biomaterials Knowledge Extraction : Protein/Peptides/Amino Acids-Precious Metal Bindings • Recognition and extraction of crystalline surface patterns for precious metals (e.g., Gold/Silver surface patterns via Miller Indices - Au(100), Au(110), Ag(111)), protein/peptide/amino acid sequences, and indicators of binding relationship. – Example Input: They found that an alanine-substituted peptide (AYSSGAPPAPPF) exhibited the highest affinity for gold, while a proline-substituted peptide (AYPPGAPPMPPF) showed almost no affinity. 60
  • 61. 61
  • 62. Semantic Web Based Data Exchange and Interoperability for OEM-Supplier Collaboration 62
  • 63. Goal and Example Accomplishment • Implement a Collaboration Platform using Semantic Web technology in the backend. – Semantic Web representation (RDF) and querying (SPARQL) hidden from the users (domain scientists) for convenience. • Example functionality incorporated in the “Beta” version of the PW-11 Collaboration Platform – Creation of a project by its owner and assigning users to groups (e.g., ordinary, external, foreign) in a project – Assigning access control rights based on group/user/file – Searching, requesting, and uploading files respecting access restrictions 63
  • 64. Overall Plan • Implement necessary user interfaces and backend processing to facilitate the Collaboration use cases. – Develop and document user interfaces to support flexible access control and data exchange – Store information as metadata in the form of triples to support light-weight reasoning • Virtuoso triple store – Upload and store files (in the server’s file system) respecting user-project access control restrictions • Ubuntu, Java VM, Apache Tomcat Web Server 64
  • 65. Pre-requisites • Pre-populated set of authorized users (for authentication) – Realistically this will require significant scrutiny of a user outside the collaboration platform. • Simple access control architecture and mechanisms (that can be extended further based on user feedback). • Kno.e.sis prototype assumed availability of an ITAR certified container to host the collaboration platform. Thus, the development of additional infrastructure for ITAR compliance was out of scope. 65
  • 66. Public-Private Data Sharing • Enhance publicly available datasets while retaining intellectual property data privately for businesses 66 Private data and metadata (e.g. ongoing experimental processes, intellectual property data) Selectively shared data and metadata (e.g. with ongoing collaborators, licensed data) Public data and metadata (e.g., released products, material specifications)
  • 67. OEM partner A Federated Architecture 67 Private Shared Public Federal Endpoint 1. User Authentication 2. Federated Semantic Query Processor AC Processor Semantic Query Processor OEM partner B Private Shared Public AC Processor Semantic Query Processor OEM supplier C Private Shared Public AC Processor Semantic Query Processor 3. Semantics Mappings
  • 68. Principles of a Federation • Each component controls access to its local data independently (local autonomy). • A query is decomposed to multiple sub-queries, each sub-query is executed at one component. • Results from sub-queries are combined by the federated query processor (control global access)
  • 69. Kno.e.sis Tools • Doozer: Ontology creator from Wikipedia category hierarchy • Scooner: Tool for trailblazing using semantic triples • Kino: Faceted Search Engine • iExplore: Visualize and navigate semantic / linked data • BLOOMS: Ontology alignment tool 69
  • 70. Take Away Use of semantic web technologies can help overcome challenges associated with data discovery, integration, and interoperability, caused by data heterogeneity, and use of provenance and access control can help to share/exchange data reliably. 70
  • 71. 71 thank you, and please visit us at http://knoesis.org/ Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio, USA Kno.e.sis