Mais conteúdo relacionado Semelhante a Integrating Semantic Web with the Real World - A Journey between Two Cities - 2018 (20) Mais de Juan Sequeda (20) Integrating Semantic Web with the Real World - A Journey between Two Cities - 20181. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
OR
Resurrection of the Knowledge Engineer
Juan F. Sequeda
2018
Integrating Semantic Web in the Real World:
A journey between two cities
3. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Take Away Message
• Reflect on our journey to commercialize semantic
web technology to address data integration and
business intelligence needs.
Question
• Why is it so hard to deploy Semantic Web technologies in
the real world?
• Answer:
1. History
2. Knowledge Engineer
3. Ontology/mapping engineering
• “Call to Arms”
4. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Data
Logic
RDBMS
Semantic
Web
Workshop on
Logic and Data Bases,
Toulouse 1977
Gallaire, Nicolas &
Minker
SQL99
Recursion
KL-ONE
Description
Logic RDF OWL
Views Triggers
Semantic
Networks
Japanese 5th
Generation Project
MCC
Austin, TX
Today1970s
Relational
Algebra
Workshops on
Expert Systems
Deductive Databases
KRDB
1980s 1990s 2000s
Let’s put in Today’s Context
4
History
Alvey Project
United Kingdom
5. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Where we started in 2007… What is the relationship between
Relational Model
Table Definition
ConstraintsS
Q
L
Relational Databases
RDF
RDFS
OWL
S
P
A
R
Q
L
TIME
Triggers Rules
Semantic Web
Sequeda et al. SQL Databases are a Moving Target. W3C Workshop on RDF Access on RDB. 2007
6. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Over 10 years ago
• D2R (Map,Q,Server), Virtuoso RDF Views, SquirrelRDF, R2D2,
Relational.OWL, DB2OWL, R2O, Triplify, Dartgrid, RDBToOnto,
METAmorphoses, …
7. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
“Comparing the overall performance […] of the fastest rewriter with
the fastest relational database shows an overhead for query
rewriting of 106%. This is an indicator that there is still room for
improving the rewriting algorithms” .
[Bizer and Schultz. Berlin SPARQL Benchmark 2009]
Current rdb2rdf systems are not capable of providing the query
execution performance required [...] it is likely that with more work
on query translation, suitable mechanisms for translating queries
could be developed. These mechanisms should focus on exploiting
the underlying database system’s capabilities to optimize queries
and process large quantities of structure data [Gray et al. 2009]
Some Issues early with SPARQL to SQL wrappers
8. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
https://sourceforge.net/p/d2rq-map/mailman/message/28055191/
Sept 2011
9. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Why was this happening if …
ISWC 2008
OUR RESEARCH QUESTION
HOW and to what EXTENT can Relational Databases be integrated with the Semantic Web?
10. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
(1) Relational Databases à Semantic Web: Direct Mapping
10
I
R, Σ
• Formalization in Datalog
• Databases with NULLs
• Correctness of a Direct Mapping
• Information Preservation
• Query Preservation
• Monotonicity
• Semantics Preservation
DM(R, Σ, I)
• No monotone direct
mapping is semantics
preserving
On Directly Mapping Relational Databases to RDF and OWL. Sequeda, Arenas, Miranker. WWW 2012
Hypothesis: Relational Databases can be automatically mapped to RDF and OWL under a
correct mapping
H
11. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
(2)Relational Databases ß Semantic Web : Ultrawrap
11
Relational
Database
Tripleview
Mapping
Compiler
SPARQL to SQL
on Views
SQL Optimizer
Mapping as
Views
Direct
Mapping
Results
Ultrawrap: SPARQL Execution on Relational Data. Sequeda & Miranker. J. Web Semantics 2013
• Chakravarthy, Grant and Minker. Logic-
Based Approach to Semantic Query
Optimization. TODS1990
• Cheng et al. (1990) Implementation of
Two Semantic Query Optimization
Techniques in DB2 Universal
Database. VLDB1999
• Semantic Query Optimization
• Detection of Unsatisfiable
Conditions
• Self Join Elimination
• Commercial RDB H
Hypothesis: Existing commercial relational databases already subsume algorithms and
optimizations needed to support effective SPARQL execution on relationally stored data
12. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
(3)Relational Databases ßàSemantic Web: UltrawrapOBDA
12
Relational
Database
Tripleview
Mapping
Compiler
SPARQL to SQL
on Views
SQL Optimizer
Mapping as
Views
Saturated
Mapping
Results
Mapping
OBDA: Query Rewriting or Materialization? In practice, Both! Sequeda, Arenas, Miranker. ISWC 2014 (Best Paper)
EL
RL
QL
DL
• Gallaire et al. Logic and Databases: A Deductive
Approach. ACM Survey 1984
• Chaudhuri et al. Optimizing queries with
materialized views. ICDE95
Harinarayan et al. Implementing Data Cubes
Efficiently. SIGMOD96
• Halevy. Answering queries using views: A survey.
VLDBJ2001
• Mami & Bellahsene. A Survey of View Selection
Methods. SIGMOD Record 2012
• Commercial RDB
• Answering Queries
using Views
• Rewriting using
materialized views
• Recursion in SQL
H
OWLOWL SQL
Hypothesis: We can effect optimizations for Ontology-
Based Data Access (OBDA) by push processing into the
RDBMS, thus acting as a reasoner.
13. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
HOW and to what EXTENT can RDB be integrated with the SW?
13
RDB can be automatically directly
mapped to RDF and OWL, formally
defined in Datalog
Define mappings as SQL Views which
allows RDB to do the optimization. Two
important existing semantic query
optimizations in commercial RDB.
RDB can act as a reasoner using
Saturated Mappings, Query rewriting
using Materialized Views and Recursion
Direct Mappings can be Monotone,
Information Preserving and Query
Preserving. Monotonicity is an
obstacle for Semantics Preservation
SPARQL 1.0 (relational core)
“OWL-SQL”: Ontologies with
inheritance and transitivity
HOW EXTENT
Direct
Mapping
(WWW2012)
Ultrawrap
(J. Web Sem
2013)
Ultrawrap
OBDA
(ISWC2014)
14. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Where did our research journey take us?
14
Oracle
SQL
Server
Postgres
MySQL
IBM DB2
Enterprise Knowledge Graph
• Sheth & Larson. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Survey. 1990
• Carnot92, Infosleuth92, SIMS93, Information Manifold96, Lore96, TSIMMIS97, Kleisli99, Nimble01, Clio01, Sphinx04
H
15. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Our Journey
15
https://constituteproject.org/
16. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com 16
SEMANTIC CITY NON-SEMANTIC CITY
17. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
IT Biz
Total net
sales of
all Orders
today
Reports
Data Integration and Business Intelligence
17
Conceptualization
Gap
Order Customer
purchased by
18. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Business Question
How many orders were placed in December 2018?
317,595
317,124
316,899
Billing
Shipping
E-Commerce
18
19. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
What do you mean by …
What is an Order?
When a user
clicks “Order” on
the website
When the
customer has
received the
product
When it comes
out of the billing
system and the CC
has been charged
Billing
Shipping
E-Commerce
19
20. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
IT
Biz
Total net
sales of
all Orders
today
Data
Architect
SELECT
..
FROM …
csv csv
csv
MS
Access
T=1
T=2T=3
XLS
• Did the Biz User
communicate the correct
message to IT?
• Did IT understand correctly
what the Biz User wanted?
• Did IT deliver the
correct/precise results?
Reports
XLS
XLS
Status Quo 1
20
https://www.wsj.com/articles/finance-pros-say-youll-have-to-pry-excel-out-of-their-cold-dead-hands-1512060948
21. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Enterprise
Data Warehouse
IT Biz
Reports
Time and $
Total net
sales of
all Orders
today
ETL
ETL
ETL
Total net
sales of all
Orders
today with
FX
Status Quo 2
Data
Architect
21
22. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
What is actually going on here
• Subject Matter Expert knows
the business domain
• Dialog between users
• Understand the Domain
• Find where it is in the data
• Sound familiar?
22
Giarratano & Riley. Expert Systems:
Principles and Programming. 1989
H
(Target Ontology/Schema)
(Source to Target Mappings)
23. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Semantic Web (Ontology-Based Data
Access/Integration/Management) can help...right?
Who creates this?
Using what tools?
IT IS NOT EASY!
HOWEVER
24. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Chasm between the two Cities
24G. Moore. Crossing the Chasm.
SEMANTIC CITY NON-SEMANTIC CITY
25. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Chasm Observation 1: Boiling the Ocean
• Ontology Engineering
– Traditional ontology
engineering
methodologies
– Using competency
questions
– Test driven development
– Ontology design patterns
– ...
• Mapping Engineering
– Ontology
Matching/Alignment
– Schema
Matching/Alignment
25
“There is not a right ontology. But a useful”
- F. van Harmelen
https://www.flickr.com/photos/eclogite/4950276577/
26. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Chasm Observation 2: Real World Schemas are Hard
26
27. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Chasm Observation 3: Real World Mappings are Hard
27
How to deal with NULLs and Duplicates in a mapping?
Should we care about NULL values?
28. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Chasm Observation 4: Knowledge Hoarding
28
Power
Control
Job Security
Everybody comes to them. Makes them feel important.
New technologies
and solutions are
threats to the
kingdom they have
built and control
29. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Chasm Observation 5: Tools are made for Semantic City
29
30. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com 30
Is the solution to create new and
improved tools?
Use Machine Learning/Deep
Learning/... ?
31. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
IT Biz
The Resurrection of the Knowledge Engineer!
31
KE
Knowledge
Engineer
Data
Engineers
Domain (Biz)
Experts
Business &
Data
Modeling
Data Access
“People Person”“Geeky Person”
D. Michie. Knowledge Engineering. Kybernetes 1973
H
E. Feigenbaum. The Art of Artificial Inteligence:
Themes and case studies of knowledge engineering. 1977
Studer et al. Knowledge Engineering: Principles and methods.
Data & Know. Engineering 1998
32. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Google’s Job Posting for Linguist/Ontologist
32
https://careers.google.com/jobs#!t=jo&jid=/google/linguist-ontologist-google-knowledge-firebase-345-spear-st-san-francisco-ca-3182490028
Left Side Right Side
• Analyze graph structures
and content and develop
new semantic
representations.
• Make decisions and
provide guidance about
ontologies and semantic
representations.
• Write code to gather,
process, and analyze data
of various kinds.
• Work with researchers,
engineers, and linguists
to develop new
techniques for expansion,
improvement, and
analysis of the Knowledge
Graph.
33. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Thomson Reuter’s Job Posting for Data Engineer
33https://jobs.thomsonreuters.com/ShowJob/Id/54478/Data-engineer/
Left Side Right Side
• Experience in conceptualizing, designing
and building big data solutions and
integration systems with data from
multiple sources [...]
• Knowledge of Open Data initiatives and
Linked Data (semantic web) standards
• Familiarity with relational databases and
query languages such as SQL, as well as
Java, NoSQL databases, SPARQL, RDF and
graph technologies, and scripting
knowledge
• Comfortable with data massaging,
wrangling and concordance across
multiple sources and diverse formats [...]
• Basic knowledge/understanding of use of
ontologies [...]
• Experience in implementing end user
cases in the big data space; having
worked on the technical and business
aspects of data integration and business
intelligence [...]
• Content familiarity and willingness to
learn more, particularly around ‘pivot’
sets such as organizations, people, news,
metadata [...]; ability to forensically
analyze, decompose and model content
from originating sources
• Basic understanding of financial industry
and institutions and their business
• Knowledge of customer business
workflows [...]
• Strong analytical skills, ability to translate
business use-cases into functional
requirements [...]
• Excellent communications skill
34. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com 34
Left Side Right Side
• Leading the design and development of
an enterprise ontology
• High focus on automation, and designing
scalable self curing processes
• Degree in Taxonomy, Knowledge
Management, Knowledge
Representation, Information Science,
Information Management
• Hands on knowledge of one or more
interoperability standards eg RDF. RDFS,
OWL, SKOS; Demonstrated expertise with
data querying (SQL, SPARQL etc)
• Demonstrated expertise with triple
stores and knowledge of their place in
the ecosystem of data caches and
delivery funnels;
• Proficiency with programming languages
such as Python or Java
• Work with domain experts to distill
domain specific knowledge into
computable elements
• Work in close collaboration with
Taxonomy Managers and Analysts,
Product Managers, and Software
Engineers ...
• ... Influence and/or advocate changes
within the organization at all levels
• In collaboration with subject matter
experts and engineers, develop the ingest
requirements
• Evangelize the role of Ontology and
Taxonomy
• Great verbal communication skills with
the ability to present complex technical
information in a clear and engaging
manner to a variety of technical and non-
technical audiences
https://mastercard.jobs/ofallon-mo/ontologist-enterprise-architecture/D649ADFEAA544EB692EBC90F886A7673/job/
35. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com 35
36. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
And the list continues
• Uber
• Airbnb
• Facebook
• IBM
• Ebay
• Pinterest
• Intuit
• Bosch
• ...
36
37. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Knowledge Engineer vs Data Scientist
37
IT BizKE
Knowledge
Engineer
IT Domain (Biz)
Experts
DS
Data
Scientist
“Most data scientists spend only 20 percent of their time
on actual data analysis and 80 percent of their time finding,
cleaning, and reorganizing huge amounts of data, which is
an inefficient data strategy”
https://www.infoworld.com/article/3228245/data-science/the-80-20-data-science-dilemma.html
38. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
We have to be careful! Let’s not forget about history
• Knowledge Engineering as a “transfer process” of
human knowledge to a KB during the 80s did not
succeed
• Why?
• Collect Knowledge via interviews with domain
experts
• Did not scale
38
Studer et al. Knowledge Engineering: Principles and methods. Data & Know. Engineering 1998
H
39. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com 39
How should the Knowledge
Engineer be empowered today in
order to be successful?
40. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Idea 1: Pay as you go Methodology
40
A Pay-As-You-Go Methodology for Ontology-Based Data Access. Sequeda & Miranker. IEEE Internet Computing 2017
- Studer et al. Knowledge Engineering: Principles
and methods. Data & Know. Engineering 1998
- CommonKADS, MIKE, PROTÉGÉ, KEATS, VITAL,
EXPECT
- METHONTOLOGY, ...
Knowledge Engineering as a modeling process
H
41. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Idea 2: Extract mappings from Source Queries
SELECT
o.orderid, o.orderdate,
o.ordertotal
- ot.finaltax
- CASE
WHEN o.currencyid in (‘USD’, ‘CAD’) THEN
o.shippingcost
ELSE o.shippingcost - ot.shippingtax
END AS netsales,
o.currencyid
FROM order o, ordertax ot
WHERE o.orderid = ordertax.orderid
AND o.statusid NOT IN (4, 5)
42. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Idea 3: Revisit Data Integration from Socio-technical view
42
IT Knowledge
Engineer
Domain
Expert
Data
Scientist
Other
Business
People
Methodology Tools
43. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Idea 4: Tools for the Knowledge Engineer
43
ACQUIST, AQUINAS, KEATS, WebODE ... H
https://gra.fo/
44. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com 44
45. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Bridging the Chasm
45
SEMANTIC CITY NON-SEMANTIC CITY
Knowledge Engineer
Socio-
technical
46. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Thanks to my collaborators
• Daniel Miranker (UT Austin)
• Marcelo Arenas (PUC Chile)
• Oscar Corcho (UPM)
• .. And many more
• Daniel Miranker
• Wayne Heideman
• Will Briggs
• Rick Liao
• Bill Rogers
• ... And many more
46
47. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Takeaway Message
47
Juan Sequeda, Ph.D
Co-Founder – Capsenta
juan@capsenta.com
@juansequeda
Sequeda J. Integrating Relational Databases with the Semantic Web. IOS Press. 2016
http://www.iospress.nl/book/integrating-relational-databases-with-the-semantic-web/
We are always looking for
smart people
(and Knowledge Engineers!)
THANK YOU!
Don’t reinvent the wheel
Know the History
Read pre-pdf paper
Knowledge Engineer
It’s back
And sexy
Ontology and Mapping
Engineering challenges
New Problems with HITL
1. We need to bridge the chasm between the Semantic and Non-Semantic Cities.
2. We need Knowledge Engineers, who need to be empowered with methodologies and tools.
3. CALL TO ARMS: We need to research socio-technical phenomena of data integration.
Why is it so hard to deploy Semantic Web technologies in the real world?