SlideShare uma empresa Scribd logo
1 de 47
Baixar para ler offline
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
OR
Resurrection of the Knowledge Engineer
Juan F. Sequeda
2018
Integrating Semantic Web in the Real World:
A journey between two cities
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com 2
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Take Away Message
• Reflect on our journey to commercialize semantic
web technology to address data integration and
business intelligence needs.
Question
• Why is it so hard to deploy Semantic Web technologies in
the real world?
• Answer:
1. History
2. Knowledge Engineer
3. Ontology/mapping engineering
• “Call to Arms”
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Data
Logic
RDBMS
Semantic
Web
Workshop on
Logic and Data Bases,
Toulouse 1977
Gallaire, Nicolas &
Minker
SQL99
Recursion
KL-ONE
Description
Logic RDF OWL
Views Triggers
Semantic
Networks
Japanese 5th
Generation Project
MCC
Austin, TX
Today1970s
Relational
Algebra
Workshops on
Expert Systems
Deductive Databases
KRDB
1980s 1990s 2000s
Let’s put in Today’s Context
4
History
Alvey Project
United Kingdom
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Where we started in 2007… What is the relationship between
Relational Model
Table Definition
ConstraintsS
Q
L
Relational Databases
RDF
RDFS
OWL
S
P
A
R
Q
L
TIME
Triggers Rules
Semantic Web
Sequeda et al. SQL Databases are a Moving Target. W3C Workshop on RDF Access on RDB. 2007
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Over 10 years ago
• D2R (Map,Q,Server), Virtuoso RDF Views, SquirrelRDF, R2D2,
Relational.OWL, DB2OWL, R2O, Triplify, Dartgrid, RDBToOnto,
METAmorphoses, …
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
“Comparing the overall performance […] of the fastest rewriter with
the fastest relational database shows an overhead for query
rewriting of 106%. This is an indicator that there is still room for
improving the rewriting algorithms” .
[Bizer and Schultz. Berlin SPARQL Benchmark 2009]
Current rdb2rdf systems are not capable of providing the query
execution performance required [...] it is likely that with more work
on query translation, suitable mechanisms for translating queries
could be developed. These mechanisms should focus on exploiting
the underlying database system’s capabilities to optimize queries
and process large quantities of structure data [Gray et al. 2009]
Some Issues early with SPARQL to SQL wrappers
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
https://sourceforge.net/p/d2rq-map/mailman/message/28055191/
Sept 2011
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Why was this happening if …
ISWC 2008
OUR RESEARCH QUESTION
HOW and to what EXTENT can Relational Databases be integrated with the Semantic Web?
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
(1) Relational Databases à Semantic Web: Direct Mapping
10
I
R, Σ
• Formalization in Datalog
• Databases with NULLs
• Correctness of a Direct Mapping
• Information Preservation
• Query Preservation
• Monotonicity
• Semantics Preservation
DM(R, Σ, I)
• No monotone direct
mapping is semantics
preserving
On Directly Mapping Relational Databases to RDF and OWL. Sequeda, Arenas, Miranker. WWW 2012
Hypothesis: Relational Databases can be automatically mapped to RDF and OWL under a
correct mapping
H
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
(2)Relational Databases ß Semantic Web : Ultrawrap
11
Relational
Database
Tripleview
Mapping
Compiler
SPARQL to SQL
on Views
SQL Optimizer
Mapping as
Views
Direct
Mapping
Results
Ultrawrap: SPARQL Execution on Relational Data. Sequeda & Miranker. J. Web Semantics 2013
• Chakravarthy, Grant and Minker. Logic-
Based Approach to Semantic Query
Optimization. TODS1990
• Cheng et al. (1990) Implementation of
Two Semantic Query Optimization
Techniques in DB2 Universal
Database. VLDB1999
• Semantic Query Optimization
• Detection of Unsatisfiable
Conditions
• Self Join Elimination
• Commercial RDB H
Hypothesis: Existing commercial relational databases already subsume algorithms and
optimizations needed to support effective SPARQL execution on relationally stored data
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
(3)Relational Databases ßàSemantic Web: UltrawrapOBDA
12
Relational
Database
Tripleview
Mapping
Compiler
SPARQL to SQL
on Views
SQL Optimizer
Mapping as
Views
Saturated
Mapping
Results
Mapping
OBDA: Query Rewriting or Materialization? In practice, Both! Sequeda, Arenas, Miranker. ISWC 2014 (Best Paper)
EL
RL
QL
DL
• Gallaire et al. Logic and Databases: A Deductive
Approach. ACM Survey 1984
• Chaudhuri et al. Optimizing queries with
materialized views. ICDE95
Harinarayan et al. Implementing Data Cubes
Efficiently. SIGMOD96
• Halevy. Answering queries using views: A survey.
VLDBJ2001
• Mami & Bellahsene. A Survey of View Selection
Methods. SIGMOD Record 2012
• Commercial RDB
• Answering Queries
using Views
• Rewriting using
materialized views
• Recursion in SQL
H
OWLOWL SQL
Hypothesis: We can effect optimizations for Ontology-
Based Data Access (OBDA) by push processing into the
RDBMS, thus acting as a reasoner.
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
HOW and to what EXTENT can RDB be integrated with the SW?
13
RDB can be automatically directly
mapped to RDF and OWL, formally
defined in Datalog
Define mappings as SQL Views which
allows RDB to do the optimization. Two
important existing semantic query
optimizations in commercial RDB.
RDB can act as a reasoner using
Saturated Mappings, Query rewriting
using Materialized Views and Recursion
Direct Mappings can be Monotone,
Information Preserving and Query
Preserving. Monotonicity is an
obstacle for Semantics Preservation
SPARQL 1.0 (relational core)
“OWL-SQL”: Ontologies with
inheritance and transitivity
HOW EXTENT
Direct
Mapping
(WWW2012)
Ultrawrap
(J. Web Sem
2013)
Ultrawrap
OBDA
(ISWC2014)
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Where did our research journey take us?
14
Oracle
SQL
Server
Postgres
MySQL
IBM DB2
Enterprise Knowledge Graph
• Sheth & Larson. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Survey. 1990
• Carnot92, Infosleuth92, SIMS93, Information Manifold96, Lore96, TSIMMIS97, Kleisli99, Nimble01, Clio01, Sphinx04
H
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Our Journey
15
https://constituteproject.org/
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com 16
SEMANTIC CITY NON-SEMANTIC CITY
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
IT Biz
Total net
sales of
all Orders
today
Reports
Data Integration and Business Intelligence
17
Conceptualization
Gap
Order Customer
purchased by
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Business Question
How many orders were placed in December 2018?
317,595
317,124
316,899
Billing
Shipping
E-Commerce
18
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
What do you mean by …
What is an Order?
When a user
clicks “Order” on
the website
When the
customer has
received the
product
When it comes
out of the billing
system and the CC
has been charged
Billing
Shipping
E-Commerce
19
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
IT
Biz
Total net
sales of
all Orders
today
Data
Architect
SELECT
..
FROM …
csv csv
csv
MS
Access
T=1
T=2T=3
XLS
• Did the Biz User
communicate the correct
message to IT?
• Did IT understand correctly
what the Biz User wanted?
• Did IT deliver the
correct/precise results?
Reports
XLS
XLS
Status Quo 1
20
https://www.wsj.com/articles/finance-pros-say-youll-have-to-pry-excel-out-of-their-cold-dead-hands-1512060948
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Enterprise
Data Warehouse
IT Biz
Reports
Time and $
Total net
sales of
all Orders
today
ETL
ETL
ETL
Total net
sales of all
Orders
today with
FX
Status Quo 2
Data
Architect
21
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
What is actually going on here
• Subject Matter Expert knows
the business domain
• Dialog between users
• Understand the Domain
• Find where it is in the data
• Sound familiar?
22
Giarratano & Riley. Expert Systems:
Principles and Programming. 1989
H
(Target Ontology/Schema)
(Source to Target Mappings)
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Semantic Web (Ontology-Based Data
Access/Integration/Management) can help...right?
Who creates this?
Using what tools?
IT IS NOT EASY!
HOWEVER
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Chasm between the two Cities
24G. Moore. Crossing the Chasm.
SEMANTIC CITY NON-SEMANTIC CITY
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Chasm Observation 1: Boiling the Ocean
• Ontology Engineering
– Traditional ontology
engineering
methodologies
– Using competency
questions
– Test driven development
– Ontology design patterns
– ...
• Mapping Engineering
– Ontology
Matching/Alignment
– Schema
Matching/Alignment
25
“There is not a right ontology. But a useful”
- F. van Harmelen
https://www.flickr.com/photos/eclogite/4950276577/
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Chasm Observation 2: Real World Schemas are Hard
26
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Chasm Observation 3: Real World Mappings are Hard
27
How to deal with NULLs and Duplicates in a mapping?
Should we care about NULL values?
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Chasm Observation 4: Knowledge Hoarding
28
Power
Control
Job Security
Everybody comes to them. Makes them feel important.
New technologies
and solutions are
threats to the
kingdom they have
built and control
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Chasm Observation 5: Tools are made for Semantic City
29
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com 30
Is the solution to create new and
improved tools?
Use Machine Learning/Deep
Learning/... ?
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
IT Biz
The Resurrection of the Knowledge Engineer!
31
KE
Knowledge
Engineer
Data
Engineers
Domain (Biz)
Experts
Business &
Data
Modeling
Data Access
“People Person”“Geeky Person”
D. Michie. Knowledge Engineering. Kybernetes 1973
H
E. Feigenbaum. The Art of Artificial Inteligence:
Themes and case studies of knowledge engineering. 1977
Studer et al. Knowledge Engineering: Principles and methods.
Data & Know. Engineering 1998
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Google’s Job Posting for Linguist/Ontologist
32
https://careers.google.com/jobs#!t=jo&jid=/google/linguist-ontologist-google-knowledge-firebase-345-spear-st-san-francisco-ca-3182490028
Left Side Right Side
• Analyze graph structures
and content and develop
new semantic
representations.
• Make decisions and
provide guidance about
ontologies and semantic
representations.
• Write code to gather,
process, and analyze data
of various kinds.
• Work with researchers,
engineers, and linguists
to develop new
techniques for expansion,
improvement, and
analysis of the Knowledge
Graph.
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Thomson Reuter’s Job Posting for Data Engineer
33https://jobs.thomsonreuters.com/ShowJob/Id/54478/Data-engineer/
Left Side Right Side
• Experience in conceptualizing, designing
and building big data solutions and
integration systems with data from
multiple sources [...]
• Knowledge of Open Data initiatives and
Linked Data (semantic web) standards
• Familiarity with relational databases and
query languages such as SQL, as well as
Java, NoSQL databases, SPARQL, RDF and
graph technologies, and scripting
knowledge
• Comfortable with data massaging,
wrangling and concordance across
multiple sources and diverse formats [...]
• Basic knowledge/understanding of use of
ontologies [...]
• Experience in implementing end user
cases in the big data space; having
worked on the technical and business
aspects of data integration and business
intelligence [...]
• Content familiarity and willingness to
learn more, particularly around ‘pivot’
sets such as organizations, people, news,
metadata [...]; ability to forensically
analyze, decompose and model content
from originating sources
• Basic understanding of financial industry
and institutions and their business
• Knowledge of customer business
workflows [...]
• Strong analytical skills, ability to translate
business use-cases into functional
requirements [...]
• Excellent communications skill
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com 34
Left Side Right Side
• Leading the design and development of
an enterprise ontology
• High focus on automation, and designing
scalable self curing processes
• Degree in Taxonomy, Knowledge
Management, Knowledge
Representation, Information Science,
Information Management
• Hands on knowledge of one or more
interoperability standards eg RDF. RDFS,
OWL, SKOS; Demonstrated expertise with
data querying (SQL, SPARQL etc)
• Demonstrated expertise with triple
stores and knowledge of their place in
the ecosystem of data caches and
delivery funnels;
• Proficiency with programming languages
such as Python or Java
• Work with domain experts to distill
domain specific knowledge into
computable elements
• Work in close collaboration with
Taxonomy Managers and Analysts,
Product Managers, and Software
Engineers ...
• ... Influence and/or advocate changes
within the organization at all levels
• In collaboration with subject matter
experts and engineers, develop the ingest
requirements
• Evangelize the role of Ontology and
Taxonomy
• Great verbal communication skills with
the ability to present complex technical
information in a clear and engaging
manner to a variety of technical and non-
technical audiences
https://mastercard.jobs/ofallon-mo/ontologist-enterprise-architecture/D649ADFEAA544EB692EBC90F886A7673/job/
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com 35
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
And the list continues
• Uber
• Airbnb
• Facebook
• IBM
• Ebay
• Pinterest
• Intuit
• Bosch
• ...
36
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Knowledge Engineer vs Data Scientist
37
IT BizKE
Knowledge
Engineer
IT Domain (Biz)
Experts
DS
Data
Scientist
“Most data scientists spend only 20 percent of their time
on actual data analysis and 80 percent of their time finding,
cleaning, and reorganizing huge amounts of data, which is
an inefficient data strategy”
https://www.infoworld.com/article/3228245/data-science/the-80-20-data-science-dilemma.html
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
We have to be careful! Let’s not forget about history
• Knowledge Engineering as a “transfer process” of
human knowledge to a KB during the 80s did not
succeed
• Why?
• Collect Knowledge via interviews with domain
experts
• Did not scale
38
Studer et al. Knowledge Engineering: Principles and methods. Data & Know. Engineering 1998
H
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com 39
How should the Knowledge
Engineer be empowered today in
order to be successful?
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Idea 1: Pay as you go Methodology
40
A Pay-As-You-Go Methodology for Ontology-Based Data Access. Sequeda & Miranker. IEEE Internet Computing 2017
- Studer et al. Knowledge Engineering: Principles
and methods. Data & Know. Engineering 1998
- CommonKADS, MIKE, PROTÉGÉ, KEATS, VITAL,
EXPECT
- METHONTOLOGY, ...
Knowledge Engineering as a modeling process
H
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Idea 2: Extract mappings from Source Queries
SELECT
o.orderid, o.orderdate,
o.ordertotal
- ot.finaltax
- CASE
WHEN o.currencyid in (‘USD’, ‘CAD’) THEN
o.shippingcost
ELSE o.shippingcost - ot.shippingtax
END AS netsales,
o.currencyid
FROM order o, ordertax ot
WHERE o.orderid = ordertax.orderid
AND o.statusid NOT IN (4, 5)
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Idea 3: Revisit Data Integration from Socio-technical view
42
IT Knowledge
Engineer
Domain
Expert
Data
Scientist
Other
Business
People
Methodology Tools
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Idea 4: Tools for the Knowledge Engineer
43
ACQUIST, AQUINAS, KEATS, WebODE ... H
https://gra.fo/
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com 44
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Bridging the Chasm
45
SEMANTIC CITY NON-SEMANTIC CITY
Knowledge Engineer
Socio-
technical
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Thanks to my collaborators
• Daniel Miranker (UT Austin)
• Marcelo Arenas (PUC Chile)
• Oscar Corcho (UPM)
• .. And many more
• Daniel Miranker
• Wayne Heideman
• Will Briggs
• Rick Liao
• Bill Rogers
• ... And many more
46
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Takeaway Message
47
Juan Sequeda, Ph.D
Co-Founder – Capsenta
juan@capsenta.com
@juansequeda
Sequeda J. Integrating Relational Databases with the Semantic Web. IOS Press. 2016
http://www.iospress.nl/book/integrating-relational-databases-with-the-semantic-web/
We are always looking for
smart people
(and Knowledge Engineers!)
THANK YOU!
Don’t reinvent the wheel
Know the History
Read pre-pdf paper
Knowledge Engineer
It’s back
And sexy
Ontology and Mapping
Engineering challenges
New Problems with HITL
1. We need to bridge the chasm between the Semantic and Non-Semantic Cities.
2. We need Knowledge Engineers, who need to be empowered with methodologies and tools.
3. CALL TO ARMS: We need to research socio-technical phenomena of data integration.
Why is it so hard to deploy Semantic Web technologies in the real world?

Mais conteúdo relacionado

Mais procurados

Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Jonathan Seidman
 
Semantic Technologies for Big Data
Semantic Technologies for Big DataSemantic Technologies for Big Data
Semantic Technologies for Big Data
Marin Dimitrov
 
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...
Connected Data World
 

Mais procurados (20)

Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational Databases
 
How to Build a Smart Data Lake Using Semantics
How to Build a Smart Data Lake Using SemanticsHow to Build a Smart Data Lake Using Semantics
How to Build a Smart Data Lake Using Semantics
 
Scalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and HowScalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and How
 
The Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphThe Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge Graph
 
Enterprise Architecture in the Era of Big Data and Quantum Computing
Enterprise Architecture in the Era of Big Data and Quantum ComputingEnterprise Architecture in the Era of Big Data and Quantum Computing
Enterprise Architecture in the Era of Big Data and Quantum Computing
 
Going Beyond Rows and Columns with Graph Analytics
Going Beyond Rows and Columns with Graph AnalyticsGoing Beyond Rows and Columns with Graph Analytics
Going Beyond Rows and Columns with Graph Analytics
 
Risk Analytics Using Knowledge Graphs / FIBO with Deep Learning
Risk Analytics Using Knowledge Graphs / FIBO with Deep LearningRisk Analytics Using Knowledge Graphs / FIBO with Deep Learning
Risk Analytics Using Knowledge Graphs / FIBO with Deep Learning
 
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Graph db
Graph dbGraph db
Graph db
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
 
Graphs for Enterprise Architects
Graphs for Enterprise ArchitectsGraphs for Enterprise Architects
Graphs for Enterprise Architects
 
Open Data and News Analytics Demo
Open Data and News Analytics DemoOpen Data and News Analytics Demo
Open Data and News Analytics Demo
 
The Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewThe Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j Overview
 
Semantic Technologies for Big Data
Semantic Technologies for Big DataSemantic Technologies for Big Data
Semantic Technologies for Big Data
 
InfiniteGraph
InfiniteGraphInfiniteGraph
InfiniteGraph
 
Smarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing PlatformSmarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing Platform
 
Solution architecture for big data projects
Solution architecture for big data projectsSolution architecture for big data projects
Solution architecture for big data projects
 
Fireside Chat with Bloor Research: State of the Graph Database Market 2020
Fireside Chat with Bloor Research: State of the Graph Database Market 2020Fireside Chat with Bloor Research: State of the Graph Database Market 2020
Fireside Chat with Bloor Research: State of the Graph Database Market 2020
 
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...
 

Semelhante a Integrating Semantic Web with the Real World - A Journey between Two Cities - 2018

Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
jbellis
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
Smart Enterprise Big Data Bus for the Modern Responsive EnterpriseSmart Enterprise Big Data Bus for the Modern Responsive Enterprise
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
DataWorks Summit
 

Semelhante a Integrating Semantic Web with the Real World - A Journey between Two Cities - 2018 (20)

Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data Stores
 
The Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management StackThe Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management Stack
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
 
Bbbt presentation 210415_final_2
Bbbt presentation 210415_final_2Bbbt presentation 210415_final_2
Bbbt presentation 210415_final_2
 
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data PlatformDeploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
 
Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
 
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDeploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
 
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
 
Column Oriented Databases
Column Oriented DatabasesColumn Oriented Databases
Column Oriented Databases
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
Smart Enterprise Big Data Bus for the Modern Responsive EnterpriseSmart Enterprise Big Data Bus for the Modern Responsive Enterprise
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jr
 
ADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise AnalyticsADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise Analytics
 
What is the future of data strategy?
What is the future of data strategy?What is the future of data strategy?
What is the future of data strategy?
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to Cloud
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
 

Mais de Juan Sequeda

WTF is the Semantic Web
WTF is the Semantic WebWTF is the Semantic Web
WTF is the Semantic Web
Juan Sequeda
 
Drupal 7 and Semantic Web Hands-on Tutorial
Drupal 7 and Semantic Web Hands-on TutorialDrupal 7 and Semantic Web Hands-on Tutorial
Drupal 7 and Semantic Web Hands-on Tutorial
Juan Sequeda
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
Juan Sequeda
 
Publishing Linked Data 3/5 Semtech2011
Publishing Linked Data 3/5 Semtech2011Publishing Linked Data 3/5 Semtech2011
Publishing Linked Data 3/5 Semtech2011
Juan Sequeda
 
Introduction to Linked Data 1/5
Introduction to Linked Data 1/5Introduction to Linked Data 1/5
Introduction to Linked Data 1/5
Juan Sequeda
 
Creating Linked Data 2/5 Semtech2011
Creating Linked Data 2/5 Semtech2011Creating Linked Data 2/5 Semtech2011
Creating Linked Data 2/5 Semtech2011
Juan Sequeda
 

Mais de Juan Sequeda (20)

RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
 
Linked Data tutorial at Semtech 2012
Linked Data tutorial at Semtech 2012Linked Data tutorial at Semtech 2012
Linked Data tutorial at Semtech 2012
 
WTF is the Semantic Web and Linked Data
WTF is the Semantic Web and Linked DataWTF is the Semantic Web and Linked Data
WTF is the Semantic Web and Linked Data
 
WTF is the Semantic Web
WTF is the Semantic WebWTF is the Semantic Web
WTF is the Semantic Web
 
Drupal 7 and Semantic Web Hands-on Tutorial
Drupal 7 and Semantic Web Hands-on TutorialDrupal 7 and Semantic Web Hands-on Tutorial
Drupal 7 and Semantic Web Hands-on Tutorial
 
Free Money (a.k.a Fellowships)
Free Money (a.k.a Fellowships)Free Money (a.k.a Fellowships)
Free Money (a.k.a Fellowships)
 
Conclusions - Linked Data
Conclusions - Linked DataConclusions - Linked Data
Conclusions - Linked Data
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
 
Publishing Linked Data 3/5 Semtech2011
Publishing Linked Data 3/5 Semtech2011Publishing Linked Data 3/5 Semtech2011
Publishing Linked Data 3/5 Semtech2011
 
Introduction to Linked Data 1/5
Introduction to Linked Data 1/5Introduction to Linked Data 1/5
Introduction to Linked Data 1/5
 
Welcome to Linked Data 0/5 Semtech2011
Welcome to Linked Data 0/5 Semtech2011Welcome to Linked Data 0/5 Semtech2011
Welcome to Linked Data 0/5 Semtech2011
 
Creating Linked Data 2/5 Semtech2011
Creating Linked Data 2/5 Semtech2011Creating Linked Data 2/5 Semtech2011
Creating Linked Data 2/5 Semtech2011
 
Introduccion a la Web Semantica
Introduccion a la Web SemanticaIntroduccion a la Web Semantica
Introduccion a la Web Semantica
 
What is the Semantic Web
What is the Semantic WebWhat is the Semantic Web
What is the Semantic Web
 
Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010
 
Welcome to Consuming Linked Data tutorial WWW2010
Welcome to Consuming Linked Data tutorial WWW2010Welcome to Consuming Linked Data tutorial WWW2010
Welcome to Consuming Linked Data tutorial WWW2010
 
Introduction to Linked Data - WWW2010
Introduction to Linked Data - WWW2010 Introduction to Linked Data - WWW2010
Introduction to Linked Data - WWW2010
 
Consuming Linked Data by Humans - WWW2010
Consuming Linked Data by Humans - WWW2010Consuming Linked Data by Humans - WWW2010
Consuming Linked Data by Humans - WWW2010
 
Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010
 
Linked Data Applications - WWW2010
Linked Data Applications - WWW2010Linked Data Applications - WWW2010
Linked Data Applications - WWW2010
 

Último

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Integrating Semantic Web with the Real World - A Journey between Two Cities - 2018

  • 1. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com OR Resurrection of the Knowledge Engineer Juan F. Sequeda 2018 Integrating Semantic Web in the Real World: A journey between two cities
  • 2. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com 2
  • 3. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Take Away Message • Reflect on our journey to commercialize semantic web technology to address data integration and business intelligence needs. Question • Why is it so hard to deploy Semantic Web technologies in the real world? • Answer: 1. History 2. Knowledge Engineer 3. Ontology/mapping engineering • “Call to Arms”
  • 4. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Data Logic RDBMS Semantic Web Workshop on Logic and Data Bases, Toulouse 1977 Gallaire, Nicolas & Minker SQL99 Recursion KL-ONE Description Logic RDF OWL Views Triggers Semantic Networks Japanese 5th Generation Project MCC Austin, TX Today1970s Relational Algebra Workshops on Expert Systems Deductive Databases KRDB 1980s 1990s 2000s Let’s put in Today’s Context 4 History Alvey Project United Kingdom
  • 5. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Where we started in 2007… What is the relationship between Relational Model Table Definition ConstraintsS Q L Relational Databases RDF RDFS OWL S P A R Q L TIME Triggers Rules Semantic Web Sequeda et al. SQL Databases are a Moving Target. W3C Workshop on RDF Access on RDB. 2007
  • 6. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Over 10 years ago • D2R (Map,Q,Server), Virtuoso RDF Views, SquirrelRDF, R2D2, Relational.OWL, DB2OWL, R2O, Triplify, Dartgrid, RDBToOnto, METAmorphoses, …
  • 7. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com “Comparing the overall performance […] of the fastest rewriter with the fastest relational database shows an overhead for query rewriting of 106%. This is an indicator that there is still room for improving the rewriting algorithms” . [Bizer and Schultz. Berlin SPARQL Benchmark 2009] Current rdb2rdf systems are not capable of providing the query execution performance required [...] it is likely that with more work on query translation, suitable mechanisms for translating queries could be developed. These mechanisms should focus on exploiting the underlying database system’s capabilities to optimize queries and process large quantities of structure data [Gray et al. 2009] Some Issues early with SPARQL to SQL wrappers
  • 8. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com https://sourceforge.net/p/d2rq-map/mailman/message/28055191/ Sept 2011
  • 9. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Why was this happening if … ISWC 2008 OUR RESEARCH QUESTION HOW and to what EXTENT can Relational Databases be integrated with the Semantic Web?
  • 10. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com (1) Relational Databases à Semantic Web: Direct Mapping 10 I R, Σ • Formalization in Datalog • Databases with NULLs • Correctness of a Direct Mapping • Information Preservation • Query Preservation • Monotonicity • Semantics Preservation DM(R, Σ, I) • No monotone direct mapping is semantics preserving On Directly Mapping Relational Databases to RDF and OWL. Sequeda, Arenas, Miranker. WWW 2012 Hypothesis: Relational Databases can be automatically mapped to RDF and OWL under a correct mapping H
  • 11. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com (2)Relational Databases ß Semantic Web : Ultrawrap 11 Relational Database Tripleview Mapping Compiler SPARQL to SQL on Views SQL Optimizer Mapping as Views Direct Mapping Results Ultrawrap: SPARQL Execution on Relational Data. Sequeda & Miranker. J. Web Semantics 2013 • Chakravarthy, Grant and Minker. Logic- Based Approach to Semantic Query Optimization. TODS1990 • Cheng et al. (1990) Implementation of Two Semantic Query Optimization Techniques in DB2 Universal Database. VLDB1999 • Semantic Query Optimization • Detection of Unsatisfiable Conditions • Self Join Elimination • Commercial RDB H Hypothesis: Existing commercial relational databases already subsume algorithms and optimizations needed to support effective SPARQL execution on relationally stored data
  • 12. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com (3)Relational Databases ßàSemantic Web: UltrawrapOBDA 12 Relational Database Tripleview Mapping Compiler SPARQL to SQL on Views SQL Optimizer Mapping as Views Saturated Mapping Results Mapping OBDA: Query Rewriting or Materialization? In practice, Both! Sequeda, Arenas, Miranker. ISWC 2014 (Best Paper) EL RL QL DL • Gallaire et al. Logic and Databases: A Deductive Approach. ACM Survey 1984 • Chaudhuri et al. Optimizing queries with materialized views. ICDE95 Harinarayan et al. Implementing Data Cubes Efficiently. SIGMOD96 • Halevy. Answering queries using views: A survey. VLDBJ2001 • Mami & Bellahsene. A Survey of View Selection Methods. SIGMOD Record 2012 • Commercial RDB • Answering Queries using Views • Rewriting using materialized views • Recursion in SQL H OWLOWL SQL Hypothesis: We can effect optimizations for Ontology- Based Data Access (OBDA) by push processing into the RDBMS, thus acting as a reasoner.
  • 13. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com HOW and to what EXTENT can RDB be integrated with the SW? 13 RDB can be automatically directly mapped to RDF and OWL, formally defined in Datalog Define mappings as SQL Views which allows RDB to do the optimization. Two important existing semantic query optimizations in commercial RDB. RDB can act as a reasoner using Saturated Mappings, Query rewriting using Materialized Views and Recursion Direct Mappings can be Monotone, Information Preserving and Query Preserving. Monotonicity is an obstacle for Semantics Preservation SPARQL 1.0 (relational core) “OWL-SQL”: Ontologies with inheritance and transitivity HOW EXTENT Direct Mapping (WWW2012) Ultrawrap (J. Web Sem 2013) Ultrawrap OBDA (ISWC2014)
  • 14. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Where did our research journey take us? 14 Oracle SQL Server Postgres MySQL IBM DB2 Enterprise Knowledge Graph • Sheth & Larson. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Survey. 1990 • Carnot92, Infosleuth92, SIMS93, Information Manifold96, Lore96, TSIMMIS97, Kleisli99, Nimble01, Clio01, Sphinx04 H
  • 15. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Our Journey 15 https://constituteproject.org/
  • 16. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com 16 SEMANTIC CITY NON-SEMANTIC CITY
  • 17. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com IT Biz Total net sales of all Orders today Reports Data Integration and Business Intelligence 17 Conceptualization Gap Order Customer purchased by
  • 18. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Business Question How many orders were placed in December 2018? 317,595 317,124 316,899 Billing Shipping E-Commerce 18
  • 19. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com What do you mean by … What is an Order? When a user clicks “Order” on the website When the customer has received the product When it comes out of the billing system and the CC has been charged Billing Shipping E-Commerce 19
  • 20. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com IT Biz Total net sales of all Orders today Data Architect SELECT .. FROM … csv csv csv MS Access T=1 T=2T=3 XLS • Did the Biz User communicate the correct message to IT? • Did IT understand correctly what the Biz User wanted? • Did IT deliver the correct/precise results? Reports XLS XLS Status Quo 1 20 https://www.wsj.com/articles/finance-pros-say-youll-have-to-pry-excel-out-of-their-cold-dead-hands-1512060948
  • 21. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Enterprise Data Warehouse IT Biz Reports Time and $ Total net sales of all Orders today ETL ETL ETL Total net sales of all Orders today with FX Status Quo 2 Data Architect 21
  • 22. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com What is actually going on here • Subject Matter Expert knows the business domain • Dialog between users • Understand the Domain • Find where it is in the data • Sound familiar? 22 Giarratano & Riley. Expert Systems: Principles and Programming. 1989 H (Target Ontology/Schema) (Source to Target Mappings)
  • 23. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Semantic Web (Ontology-Based Data Access/Integration/Management) can help...right? Who creates this? Using what tools? IT IS NOT EASY! HOWEVER
  • 24. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Chasm between the two Cities 24G. Moore. Crossing the Chasm. SEMANTIC CITY NON-SEMANTIC CITY
  • 25. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Chasm Observation 1: Boiling the Ocean • Ontology Engineering – Traditional ontology engineering methodologies – Using competency questions – Test driven development – Ontology design patterns – ... • Mapping Engineering – Ontology Matching/Alignment – Schema Matching/Alignment 25 “There is not a right ontology. But a useful” - F. van Harmelen https://www.flickr.com/photos/eclogite/4950276577/
  • 26. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Chasm Observation 2: Real World Schemas are Hard 26
  • 27. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Chasm Observation 3: Real World Mappings are Hard 27 How to deal with NULLs and Duplicates in a mapping? Should we care about NULL values?
  • 28. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Chasm Observation 4: Knowledge Hoarding 28 Power Control Job Security Everybody comes to them. Makes them feel important. New technologies and solutions are threats to the kingdom they have built and control
  • 29. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Chasm Observation 5: Tools are made for Semantic City 29
  • 30. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com 30 Is the solution to create new and improved tools? Use Machine Learning/Deep Learning/... ?
  • 31. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com IT Biz The Resurrection of the Knowledge Engineer! 31 KE Knowledge Engineer Data Engineers Domain (Biz) Experts Business & Data Modeling Data Access “People Person”“Geeky Person” D. Michie. Knowledge Engineering. Kybernetes 1973 H E. Feigenbaum. The Art of Artificial Inteligence: Themes and case studies of knowledge engineering. 1977 Studer et al. Knowledge Engineering: Principles and methods. Data & Know. Engineering 1998
  • 32. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Google’s Job Posting for Linguist/Ontologist 32 https://careers.google.com/jobs#!t=jo&jid=/google/linguist-ontologist-google-knowledge-firebase-345-spear-st-san-francisco-ca-3182490028 Left Side Right Side • Analyze graph structures and content and develop new semantic representations. • Make decisions and provide guidance about ontologies and semantic representations. • Write code to gather, process, and analyze data of various kinds. • Work with researchers, engineers, and linguists to develop new techniques for expansion, improvement, and analysis of the Knowledge Graph.
  • 33. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Thomson Reuter’s Job Posting for Data Engineer 33https://jobs.thomsonreuters.com/ShowJob/Id/54478/Data-engineer/ Left Side Right Side • Experience in conceptualizing, designing and building big data solutions and integration systems with data from multiple sources [...] • Knowledge of Open Data initiatives and Linked Data (semantic web) standards • Familiarity with relational databases and query languages such as SQL, as well as Java, NoSQL databases, SPARQL, RDF and graph technologies, and scripting knowledge • Comfortable with data massaging, wrangling and concordance across multiple sources and diverse formats [...] • Basic knowledge/understanding of use of ontologies [...] • Experience in implementing end user cases in the big data space; having worked on the technical and business aspects of data integration and business intelligence [...] • Content familiarity and willingness to learn more, particularly around ‘pivot’ sets such as organizations, people, news, metadata [...]; ability to forensically analyze, decompose and model content from originating sources • Basic understanding of financial industry and institutions and their business • Knowledge of customer business workflows [...] • Strong analytical skills, ability to translate business use-cases into functional requirements [...] • Excellent communications skill
  • 34. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com 34 Left Side Right Side • Leading the design and development of an enterprise ontology • High focus on automation, and designing scalable self curing processes • Degree in Taxonomy, Knowledge Management, Knowledge Representation, Information Science, Information Management • Hands on knowledge of one or more interoperability standards eg RDF. RDFS, OWL, SKOS; Demonstrated expertise with data querying (SQL, SPARQL etc) • Demonstrated expertise with triple stores and knowledge of their place in the ecosystem of data caches and delivery funnels; • Proficiency with programming languages such as Python or Java • Work with domain experts to distill domain specific knowledge into computable elements • Work in close collaboration with Taxonomy Managers and Analysts, Product Managers, and Software Engineers ... • ... Influence and/or advocate changes within the organization at all levels • In collaboration with subject matter experts and engineers, develop the ingest requirements • Evangelize the role of Ontology and Taxonomy • Great verbal communication skills with the ability to present complex technical information in a clear and engaging manner to a variety of technical and non- technical audiences https://mastercard.jobs/ofallon-mo/ontologist-enterprise-architecture/D649ADFEAA544EB692EBC90F886A7673/job/
  • 35. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com 35
  • 36. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com And the list continues • Uber • Airbnb • Facebook • IBM • Ebay • Pinterest • Intuit • Bosch • ... 36
  • 37. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Knowledge Engineer vs Data Scientist 37 IT BizKE Knowledge Engineer IT Domain (Biz) Experts DS Data Scientist “Most data scientists spend only 20 percent of their time on actual data analysis and 80 percent of their time finding, cleaning, and reorganizing huge amounts of data, which is an inefficient data strategy” https://www.infoworld.com/article/3228245/data-science/the-80-20-data-science-dilemma.html
  • 38. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com We have to be careful! Let’s not forget about history • Knowledge Engineering as a “transfer process” of human knowledge to a KB during the 80s did not succeed • Why? • Collect Knowledge via interviews with domain experts • Did not scale 38 Studer et al. Knowledge Engineering: Principles and methods. Data & Know. Engineering 1998 H
  • 39. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com 39 How should the Knowledge Engineer be empowered today in order to be successful?
  • 40. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Idea 1: Pay as you go Methodology 40 A Pay-As-You-Go Methodology for Ontology-Based Data Access. Sequeda & Miranker. IEEE Internet Computing 2017 - Studer et al. Knowledge Engineering: Principles and methods. Data & Know. Engineering 1998 - CommonKADS, MIKE, PROTÉGÉ, KEATS, VITAL, EXPECT - METHONTOLOGY, ... Knowledge Engineering as a modeling process H
  • 41. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Idea 2: Extract mappings from Source Queries SELECT o.orderid, o.orderdate, o.ordertotal - ot.finaltax - CASE WHEN o.currencyid in (‘USD’, ‘CAD’) THEN o.shippingcost ELSE o.shippingcost - ot.shippingtax END AS netsales, o.currencyid FROM order o, ordertax ot WHERE o.orderid = ordertax.orderid AND o.statusid NOT IN (4, 5)
  • 42. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Idea 3: Revisit Data Integration from Socio-technical view 42 IT Knowledge Engineer Domain Expert Data Scientist Other Business People Methodology Tools
  • 43. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Idea 4: Tools for the Knowledge Engineer 43 ACQUIST, AQUINAS, KEATS, WebODE ... H https://gra.fo/
  • 44. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com 44
  • 45. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Bridging the Chasm 45 SEMANTIC CITY NON-SEMANTIC CITY Knowledge Engineer Socio- technical
  • 46. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Thanks to my collaborators • Daniel Miranker (UT Austin) • Marcelo Arenas (PUC Chile) • Oscar Corcho (UPM) • .. And many more • Daniel Miranker • Wayne Heideman • Will Briggs • Rick Liao • Bill Rogers • ... And many more 46
  • 47. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com Takeaway Message 47 Juan Sequeda, Ph.D Co-Founder – Capsenta juan@capsenta.com @juansequeda Sequeda J. Integrating Relational Databases with the Semantic Web. IOS Press. 2016 http://www.iospress.nl/book/integrating-relational-databases-with-the-semantic-web/ We are always looking for smart people (and Knowledge Engineers!) THANK YOU! Don’t reinvent the wheel Know the History Read pre-pdf paper Knowledge Engineer It’s back And sexy Ontology and Mapping Engineering challenges New Problems with HITL 1. We need to bridge the chasm between the Semantic and Non-Semantic Cities. 2. We need Knowledge Engineers, who need to be empowered with methodologies and tools. 3. CALL TO ARMS: We need to research socio-technical phenomena of data integration. Why is it so hard to deploy Semantic Web technologies in the real world?