SlideShare uma empresa Scribd logo
1 de 70
Extracting, Aligning, and
Linking Data to Build
Knowledge Graphs
Craig Knoblock
University of Southern California
Thanks to my collaborators: Pedro Szekely, Linhong Zhu, Majid
Ghasemi-Gol, Mohsen Taheriyan, Minh Pham, and Steve Minton
Goal
USC Information Sciences Institute CC-By 2.0 2
raw  messy  disconnected clean  organized  linked
hard to query, analyze & visualize easy to query, analyze & visualize
Use Case: Human Trafficking
USC Information Sciences Institute CC-By 2.0 3
raw  messy  disconnected clean  organized  linked
hard to query, analyze & visualize easy to query, analyze & visualize
Use Case: Human Trafficking
USC Information Sciences Institute CC-By 2.0 4
100 million pages
~ 100 Web sites
help victims
prosecute traffickers
Example: Investigating a Reported Victim
San Diego, where else?
USC Information Sciences Institute CC-By 2.0 5
DIG Interface: Find the locations where a
potential victim was advertised
CC-By 2.0 6
Steps To Build a KG
USC Information Sciences Institute CC-By 2.0 7
Crawling Extraction
DataAcquisition
Mapping To
Ontology
Entity Linking
&Similarity
Knowledge Graph
Deployment
Query &
Visualization
Elastic
Search
Graph
DB
schema.org geonames
Data
Acquisition
Feature
Extraction
Feature
Alignment
Entity
Resolution
Graph
Construction
User
Interface
Data
Acquisition
Data Acquisition
USC Information Sciences Institute CC-By 2.0 8
downloading relevant data
batch  real-time
Web pages Web service  database 
CSV  Excel  XML  JSON
Traditional Web Crawler
(e.g., Nutch, Scrapy)
CC-By 2.0 9USC Information Sciences Institute
Web Crawling
24/7
5,000 Pages/Hour
~100,000,000 pages
Total
Steps To Build a KG
USC Information Sciences Institute CC-By 2.0 11
Crawling Extraction
DataAcquisition
Mapping To
Ontology
Entity Linking
&Similarity
Knowledge Graph
Deployment
Query &
Visualization
Elastic
Search
Graph
DB
schema.org geonames
Data
Acquisition
Feature
Extraction
Feature
Alignment
Entity
Resolution
Graph
Construction
User
Interface
Feature Extraction
USC Information Sciences Institute CC-By 2.0 12
from raw sources to structured data
• extraction from text
• extraction from structured Web pages
• extraction of image features
Extraction
USC Information Sciences Institute CC-By 2.0 13
Structured Extraction
CC-By 2.0 14
Automated Extraction
[Minton et al., Inferlink]
• Title
• Description
• Seller
• Post Date
• Expiry Date
• Price
• Location
• Category
• Member Since
• Num Views
• Post ID
USC Information Sciences Institute CC-By 2.0 15
Automated Extraction
Input: A Pile of Pages
USC Information Sciences Institute CC-By 2.0 16
Automated Extraction
input:
a pile of pages
Classify by
Templates
pages clustered
by template
USC Information Sciences Institute CC-By 2.0 17
Automated Extraction
input:
a pile of pages
Classify by
Templates
pages clustered
by template
Infer
Extractor
Infer
Extractor
Infer
Extractor
Infer
Extractor
extractor
USC Information Sciences Institute CC-By 2.0 18
Unsupervised Extraction Tool
USC Information Sciences Institute CC-By 2.0 19
Pretty Good Extractions
Want Extracted
Extra Jan. 23, 2015 Jan. 23, 2015 expires Feb
Partial Jan. 23, 2015 Jan. 23
Extraction Evaluation
Title Desc Seller Date Price Loc Cat
Member
Since
Expires Views ID
Perfect 1.0
(50/50)
.76
(37/49)
.95
(40/42)
.83
(40/48)
.87
(39/45)
.51
(23/45)
.68
(34/50)
1.0
(35/35)
.52
(15/29)
.76
(19/25)
.97
(35/36)
Pretty
Good
1.0
(50/50)
.98
(48/49)
.95
(40/42)
.83
(40/48)
.98
(44/45)
.84
(38/45)
.88
(44/50)
1.0
(35/35)
.55
(16/29)
1.0
(25/25)
1.0
(36/36)
10 websites, 5 pages each
fields
USC Information Sciences Institute CC-By 2.0 21
Steps To Build a KG
USC Information Sciences Institute CC-By 2.0 22
Crawling Extraction
DataAcquisition
Mapping To
Ontology
Entity Linking
&Similarity
Knowledge Graph
Deployment
Query &
Visualization
Elastic
Search
Graph
DB
schema.org geonames
Data
Acquisition
Feature
Extraction
Feature
Alignment
Entity
Resolution
Graph
Construction
User
Interface
Feature Alignment
USC Information Sciences Institute CC-By 2.0 23
from multiple schemas to a common domain schema
- CSV, Excel
- Database tables
- Web services
- Extractors
- Nomenclature
- Spelling
Multiple Schemas
Karma: Mapping Data to Ontologies
Services
Relational
Sources
Karma
{ JSON-LD }
Hierarchical
Sources
Schema.org
USC Information Sciences Institute CC-By 2.0 24
Semantic Labeling
[Pham et al., ISWC’16]
Offer Place Person
name price idname
Offer
Column-1 Column-2 Column-3 Column-4
British Lee-Enfield
No 4 MK 2 still …
1,000 68155c13de2f2532
Cabelas Millenium
Revolver in .45 colt
700 1711 Anderson Rd 12155a1a2938bc1
e
Learning Semantic Types
Requirements:
Learn from a small number of examples
Distinguish both string and numeric values
Can be learned quickly and is highly scalable to large
numbers of semantic types
Person OrganizationCity State
name birthdate name namename
Person
name date city state workplace
1 Fred Collins Oct 1959 Seattle WA Microsoft
2 Tina Peterson May 1980 New York NY Google
Domain Ontology
Textual
Data
Learning Semantic Types
Textual Data
Treat each column of data as a document
Apply TF-IDF Cosine Similarity
Numeric
Data
Learning Semantic Types
Numeric Data:
Apply statistical hypothesis testing to
determine which distribution fits best
Apply Kolmogorov-Smirnov Test
Features for
Semantic Labeling
• Features
– KS = Kolmogorov-Smirnov
– MW = Mann-Whitney
CC-By 2.0 29USC Information Sciences Institute
Combining the Features for
Semantic Labeling
CC-By 2.0 30USC Information Sciences Institute
Automatically Assigned
Semantic Labels
Offer
name
CreativeWork
fragment
Offer
description
Offer
identifier
Offer
datePosted
CreativeWork
Fragment
35 Whelen
Handi-Rifle
No Tags 35 Whelen Handi-rifle.
Black synthetic
stock/forearm, blued
barrel. Text 601-813-7280
….
245625390711756 October 19,
2015 12:43 pm
Cabelas
Millenium
Revolver in
.45 colt
No Tags This single action is built
to shoot and is a great
way for any level of
shooter to get involved
with a single action. …
12155a1a2938bc1e July 11, 2015
5:17 pm
1711 Anderson
Rd
swap stocks No Tags want to trade butler
creek folding stock for
black stock ruger mini
stock folder by butler
creek will swap even for
full rifle stock ….
5815600fd181fe3b September 22,
2015 1:05 am
white
streetAddress does not appear in training data -> more similar to noisy data
Results on www.msguntrader.com
number of attributes 19
Correct prediction 16
Correct label is in the top 4 predictions 18
Accuracy 84%
MRR 89%
Results on Gun Sites
Evaluation Dataset
Average number of attributes 18
Total number of attributes 176
Correct prediction (Accuracy) 56%
Correct label is in the top 4 predictions 89%
MRR 70%
Steps To Build a KG
USC Information Sciences Institute CC-By 2.0 34
Crawling Extraction
DataAcquisition
Mapping To
Ontology
Entity Linking
&Similarity
Knowledge Graph
Deployment
Query &
Visualization
Elastic
Search
Graph
DB
schema.org geonames
Data
Acquisition
Feature
Extraction
Feature
Alignment
Entity
Resolution
Graph
Construction
User
Interface
Entity Resolution
USC Information Sciences Institute CC-By 2.0 35
merging records that refer to the same entity
missing data
incorrect data
scale (~100 million records)
techniques to address
Unsupervised Collective Entity Resolution
36
USC Information Sciences Institute
same victim
same Trafficker
Unsupervised Collective Entity Resolution
USC Information Sciences Institute CC-By 2.0 37
Collective Entity Resolution
[Zhu et al, ISWC’16]
Identifying and linking instances of the same real world entity
Quiet Comfort
25 Noise
Cancelling
Headphone
Bose
Electroni
c
Product
1
Noise
Cancelling
Headphones
Product
2
292
Premium
Noise
Cancelling
Headphones
Son
y
Product
3
599
Dish Washer
Bosch
Product
4
229
Bose Noise
Cancelling
Headphones
Bos
e
Product
5
299
price description
manufacturerproduct
Multi-Type Graph
Quiet Comfort
25 Noise
Cancelling
Headphone
Bose
Electroni
c
Product
1
Noise
Cancelling
Headphones
Product
2
292
Premium
Noise
Cancelling
Headphones
Son
y
Product
3
599
Dish Washer
Bosch
Product
4
229
Bose Noise
Cancelling
Headphones
Bos
e
Product
5
299
price description
manufacturerproduct
Multi-Type Graph
Collective Entity Resolution
[Zhu et al, ISWC’16]
Identifying and linking instances of the same real world entity
Common Approach:
Pairwise Comparisons
Product 5 299
Quiet Comfort 25 Noise Cancelling
Headphone
Bose
Electronic
299, 229 Bose Noise Cancelling HeadphonesBoseProduct 4
599 Dish WasherBoschProduct 3
292 Premium Noise Cancelling HeadphonesSonyProduct 2
Noise Cancelling HeadphonesSonyProduct 1
Price TitleManufacturer
Jaro
0.5
distance
0.2
Jaccard
0.3
Acceptance Threshold: 0.8
Missing Values
Product 5 299
Quiet Comfort 25 Noise Cancelling
Headphone
Bose
Electronic
299, 229 Bose Noise Cancelling HeadphonesBoseProduct 4
599 Dish WasherBoschProduct 3
292 Premium Noise Cancelling HeadphonesSonyProduct 2
Noise Cancelling HeadphonesSonyProduct 1
Price TitleManufacturer
Jaro
0.5
distance
0.2
Jaccard
0.3
Multiple Values
Product 5 299
Quiet Comfort 25 Noise Cancelling
Headphone
Bose
Electronic
299, 229 Bose Noise Cancelling HeadphonesBoseProduct 4
599 Dish WasherBoschProduct 3
292 Premium Noise Cancelling HeadphonesSonyProduct 2
Noise Cancelling HeadphonesSonyProduct 1
Price TitleManufacturer
Jaro
0.5
distance
0.2
Jaccard
0.3
Weights
Product 5 299
Quiet Comfort 25 Noise Cancelling
Headphone
Bose
Electronic
299, 229 Bose Noise Cancelling HeadphonesBoseProduct 4
599 Dish WasherBoschProduct 3
292 Premium Noise Cancelling HeadphonesSonyProduct 2
Noise Cancelling HeadphonesSonyProduct 1
Price TitleManufacturer
Jaro
0.5
distance
0.2
Jaccard
0.30.5 0.2 0.3
Unidirectional
Product 5 299
Quiet Comfort 25 Noise Cancelling
Headphone
Bose
Electronic
299, 229 Bose Noise Cancelling HeadphonesBoseProduct 4
599 Dish WasherBoschProduct 3
292 Premium Noise Cancelling HeadphonesSonyProduct 2
Noise Cancelling HeadphonesSonyProduct 1
Price TitleManufacturer
Jaro
0.5
distance
0.2
Jaccard
0.30.5 0.2 0.3
Graph Summarization:
Original Graph
Quiet Comfort
25 Noise
Cancelling
Headphone
Bose
Electroni
c
Product
1
Noise
Cancelling
Headphones
Product
2
292
Premium
Noise
Cancelling
Headphones
Son
y
Product
3
599
Dish Washer
Bosch
Product
4
229
Bose Noise
Cancelling
Headphones
Bos
e
Product
5
299
price description
manufacturerproduct
Quiet Comfort
25 Noise
Cancelling
Headphone
Bose
Electroni
c
Product
1
Noise
Cancelling
Headphones
Product
2
292
Premium
Noise
Cancelling
Headphones
Son
y
Product
3
599
Dish Washer
Bosch
229
Bose Noise
Cancelling
Headphones
Bos
e
Product
5
299
Product
4
Similar Nodes simt(x, y)
Quiet Comfort
25 Noise
Cancelling
Headphone
Bose
Electroni
c
Product
1
Noise
Cancelling
Headphones
Product
2
292
Premium
Noise
Cancelling
Headphones
Son
y
Product
3
599
Dish Washer
Bosch
229
Bose Noise
Cancelling
Headphones
Bos
e
Product
5
299
Product
4
Graph Sumarization:
Super-Nodes
Quiet Comfort 25 Noise
Cancelling Headphone
Noise Cancelling
Headphones
Premium Noise
Cancelling Headphones
Dish Washer
Bose Noise Cancelling
Headphones
Super-nodes Ct(x)
0.7 0.2 0.1
0.7 0.2 0.1
0.2 0.7 0.1
0.2 0.7 0.1
0.1 0.1 0.8
probability that a node x belongs to each super-node
one matrix for each type
Ct
Noise
Cancelling
Headphones
Premium
Noise
Cancelling
Headphones
Dish Washer
Quiet Comfort
25 Noise
Cancelling
Headphone
Bose Noise
Cancelling
Headphones
Similar Nodes Should Be In The Same
Super-Node
Quiet Comfort
25 Noise
Cancelling
Headphone
Bose
Electroni
c
Product
1
Noise
Cancelling
Headphones
Product
2
292
Premium
Noise
Cancelling
Headphones
Son
y
Product
3
599
Dish Washer
Bosch
229
Bose Noise
Cancelling
Headphones
Bos
e
Product
5
299
Product
4
Super-Links
Quiet Comfort
25 Noise
Cancelling
Headphone
Bose
Electroni
c
Product
1
Noise
Cancelling
Headphones
Product
2
292
Premium
Noise
Cancelling
Headphones
Son
y
Product
3
599
Dish Washer
Bosch
229
Bose Noise
Cancelling
Headphones
Bos
e
Product
5
299
Product
4
Super-Links
Bose
Electroni
c
Product
3
Bosch
Bos
e
Product
5
Product
4
Predict Links In Original Graph
Bose
Electroni
c
Product
3
Bosch
Bos
e
Product
5
Product
4
Predict Links In Original Graph
Bose
Electroni
c
Product
3
Bosch
Bos
e
Product
5
Product
4
Predict Links In Original Graph
Bose
Electroni
c
Product
3
Bosch
Bos
e
Product
5
Product
4
Re-Clustering Improves Reconstruction
Quality
Bose
Electroni
c
Product
3
Bosch
Bos
e
Product
5
Product
4
Bose
Electroni
c
Product
3
Bosch
Bos
e
Product
5
Product
4
Comparable Approaches
Pairwise Clustering Unsupervised Supervised
Limes, Ngomo’11 ✔ ✔
SILK, Isele’10 ✔ ✔ ✔
Serf, Benjelloun’10 ✔ ✔
*Commercial, Kӧpcke’10 ✔ ✔
GraphSum, Riondato’14 ✔ ✔
*AuthorLDA, Bhattacharya’07 ✔ ✔
CoSum (proposed) ✔ ✔
Quality Comparison
Precision Recall F-measure
Author Paper Product Author Paper Product Author Paper Product
Limes-F 0.958 0.827 0.446 0.864 0.761 0.16 0.909 0.792 0.236
Silk-F 0.846 0.877 0.459 0.986 0.756 0.348 0.91 0.812 0.395
Gsum 0.727 0.668 0.01 0.569 0.624 0.587 0.638 0.645 0.02
CoSum-B 0.993 0.871 0.58 0.94 0.611 0.477 0.966 0.718 0.524
Limes-MO 0.912 0.827 0.446 0.944 0.761 0.16 0.928 0.792 0.236
Silk-MO 0.932 0.877 0.459 0.958 0.756 0.348 0.945 0.812 0.395
Serf 0.985 0.837 0.436 0.687 0.808 0.186 0.809 0.822 0.261
CoSum-P 0.999 0.771 0.639 0.997 0.997 0.695 0.998 0.87 0.666
Commercial 0.615 0.63 0.622
AuthorLDA 0.995
Steps To Build a KG
USC Information Sciences Institute CC-By 2.0 58
Crawling Extraction
DataAcquisition
Mapping To
Ontology
Entity Linking
&Similarity
Knowledge Graph
Deployment
Query &
Visualization
Elastic
Search
Graph
DB
schema.org geonames
Data
Acquisition
Feature
Extraction
Feature
Alignment
Entity
Resolution
Graph
Construction
User
Interface
Graph Construction
USC Information Sciences Institute CC-By 2.0 59
assembling the data for efficient query & analysis
- ElasticSearch: scalable, efficient query
- graph databases: network analytics
- NoSQL: scalable analytics
- bulk loading: massive data imports
- real-time updates: live, changing data
elasticsearch
• Cloud-based search engine
• Based on Apache Lucene
• Horizontal scaling, replication, load balancing
• Blazingly fast!
• Everything is a document
– Documents are JSON objects
– Index what you want to find
– Fields can contain strings, numbers, booleans,
etc.
CC-By 2.0 60USC Information Sciences Institute
Adult
Service
Offer Person
Efficient indexing and query
Phone
Web
Page
ElasticSearch Data Model
Offers As Roots
Products (AdultService) As Roots
Indexing for High Performance
Knowledge Graph Queries
Avg. Query Times in Milliseconds
Single User Query Load
1.2 billion triples
State of the Art Graph Database (RDF)
DIG indexing deployed in ElasticSearch
USC Information Sciences Institute CC-By 2.0 65
Steps To Build a KG
USC Information Sciences Institute CC-By 2.0 66
Crawling Extraction
DataAcquisition
Mapping To
Ontology
Entity Linking
&Similarity
Knowledge Graph
Deployment
Query &
Visualization
Elastic
Search
Graph
DB
schema.org geonames
Data
Acquisition
Feature
Extraction
Feature
Alignment
Entity
Resolution
Graph
Construction
User
Interface
DIG Deployment for Human Trafficking
USC Information Sciences Institute CC-By 2.0 68
- 100 million Web pages
- Live updates (~5,000 pages/hour)
- ElasticSearch database (7 nodes)
- Hadoop workflows (20 nodes)
- District Attorney
- Law Enforcement
- NGOs
DIG Applications
Human Trafficking
large, real users
Material Science Research
70,000 paper abstracts (built in 1 week)
Arms Trafficking
identify illegal sales
Patent Trolls
identifies patent trolls
Predicting Cyber Attacks
combines diverse sources about vulnerabilities,
exploits, etc.
CC-By 2.0 69USC Information Sciences Institute
Conclusions
• Presented the end-to-end tool-chain to
build domain-specific knowledge graphs
• Integrates heterogeneous data: web
pages, databases, CSV, web APIs,
images, etc.
• Approach scales to million of pages, and
billions facts
• Has been used to build real-world
deployed applicationsUSC Information Sciences Institute CC-By 2.0 70

Mais conteúdo relacionado

Mais procurados

An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataFabien Gandon
 
Introduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and TerminologyIntroduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and TerminologySteven Miller
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge GraphsPeter Haase
 
Boolean Retrieval
Boolean RetrievalBoolean Retrieval
Boolean Retrievalmghgk
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge GraphsJeff Z. Pan
 
CIDOC CRM Tutorial
CIDOC CRM TutorialCIDOC CRM Tutorial
CIDOC CRM TutorialISLCCIFORTH
 
Ontology for Knowledge and Data Strategies.pptx
Ontology for Knowledge and Data Strategies.pptxOntology for Knowledge and Data Strategies.pptx
Ontology for Knowledge and Data Strategies.pptxMike Bennett
 
도서관 Linked Open Data의 필요성
도서관 Linked Open Data의 필요성도서관 Linked Open Data의 필요성
도서관 Linked Open Data의 필요성Hansung University
 
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Khirulnizam Abd Rahman
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Enterprise Knowledge
 
Report on the First Knowledge Graph Reasoning Challenge 2018 -Toward the eXp...
Report on the First Knowledge Graph Reasoning Challenge  2018 -Toward the eXp...Report on the First Knowledge Graph Reasoning Challenge  2018 -Toward the eXp...
Report on the First Knowledge Graph Reasoning Challenge 2018 -Toward the eXp...KnowledgeGraph
 
Metadata is a Love Note to the Future
Metadata is a Love Note to the FutureMetadata is a Love Note to the Future
Metadata is a Love Note to the FutureRachel Lovinger
 
Entity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph CompletionEntity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph CompletionJennifer D'Souza
 
Transforming Intelligence Analysis with Knowledge Graphs
Transforming Intelligence Analysis with Knowledge GraphsTransforming Intelligence Analysis with Knowledge Graphs
Transforming Intelligence Analysis with Knowledge GraphsNeo4j
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalRoi Blanco
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebMarina Santini
 

Mais procurados (20)

RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked Data
 
RDF data model
RDF data modelRDF data model
RDF data model
 
Introduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and TerminologyIntroduction to Ontology Concepts and Terminology
Introduction to Ontology Concepts and Terminology
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
スキーマとURI
スキーマとURIスキーマとURI
スキーマとURI
 
Boolean Retrieval
Boolean RetrievalBoolean Retrieval
Boolean Retrieval
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
CIDOC CRM Tutorial
CIDOC CRM TutorialCIDOC CRM Tutorial
CIDOC CRM Tutorial
 
Ontology for Knowledge and Data Strategies.pptx
Ontology for Knowledge and Data Strategies.pptxOntology for Knowledge and Data Strategies.pptx
Ontology for Knowledge and Data Strategies.pptx
 
도서관 Linked Open Data의 필요성
도서관 Linked Open Data의 필요성도서관 Linked Open Data의 필요성
도서관 Linked Open Data의 필요성
 
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020
 
Report on the First Knowledge Graph Reasoning Challenge 2018 -Toward the eXp...
Report on the First Knowledge Graph Reasoning Challenge  2018 -Toward the eXp...Report on the First Knowledge Graph Reasoning Challenge  2018 -Toward the eXp...
Report on the First Knowledge Graph Reasoning Challenge 2018 -Toward the eXp...
 
Metadata is a Love Note to the Future
Metadata is a Love Note to the FutureMetadata is a Love Note to the Future
Metadata is a Love Note to the Future
 
Entity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph CompletionEntity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph Completion
 
Transforming Intelligence Analysis with Knowledge Graphs
Transforming Intelligence Analysis with Knowledge GraphsTransforming Intelligence Analysis with Knowledge Graphs
Transforming Intelligence Analysis with Knowledge Graphs
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 

Semelhante a Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Building Knowledge Graphs in DIG
Building Knowledge Graphs in DIGBuilding Knowledge Graphs in DIG
Building Knowledge Graphs in DIGPalak Modi
 
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...Craig Knoblock
 
Cloud computing and networking course: paper presentation -Data Mining for In...
Cloud computing and networking course: paper presentation -Data Mining for In...Cloud computing and networking course: paper presentation -Data Mining for In...
Cloud computing and networking course: paper presentation -Data Mining for In...Cristian Consonni
 
Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Boris Adryan
 
IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...
IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...
IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...Abhay Prakash
 
20220307 utah state dixon_class v15
20220307 utah state dixon_class v1520220307 utah state dixon_class v15
20220307 utah state dixon_class v15ISSIP
 
Future of AI-powered automation in business
Future of AI-powered automation in businessFuture of AI-powered automation in business
Future of AI-powered automation in businessLouis Dorard
 
Fairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInFairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInKrishnaram Kenthapadi
 
Learning the Semantics of Structured Data Sources
Learning the Semantics of Structured Data SourcesLearning the Semantics of Structured Data Sources
Learning the Semantics of Structured Data SourcesMohsen Taheriyan
 
Automatic Hypernym Classification: Towards the Induction of ...
Automatic Hypernym Classification: Towards the Induction of ...Automatic Hypernym Classification: Towards the Induction of ...
Automatic Hypernym Classification: Towards the Induction of ...butest
 
Automatic Hypernym Classification: Towards the Induction of ...
Automatic Hypernym Classification: Towards the Induction of ...Automatic Hypernym Classification: Towards the Induction of ...
Automatic Hypernym Classification: Towards the Induction of ...butest
 
A Survey on Security and Privacy of Machine Learning
A Survey on Security and Privacy of Machine LearningA Survey on Security and Privacy of Machine Learning
A Survey on Security and Privacy of Machine LearningThang Dang Duy
 
Cyber Crimes & Cyber Forensics
Cyber Crimes & Cyber ForensicsCyber Crimes & Cyber Forensics
Cyber Crimes & Cyber Forensicsjahanzebmunawar
 
[2B1]검색엔진의 패러다임 전환
[2B1]검색엔진의 패러다임 전환[2B1]검색엔진의 패러다임 전환
[2B1]검색엔진의 패러다임 전환NAVER D2
 
DBpedia Framework - BBC Talk
DBpedia Framework - BBC TalkDBpedia Framework - BBC Talk
DBpedia Framework - BBC TalkGeorgi Kobilarov
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석datasciencekorea
 
Making the web of things
Making the web of thingsMaking the web of things
Making the web of thingsSimon Wardley
 
2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pub2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pubStephen Buxton
 
Some recent Research and Resources in the area of Data Science
Some recent Research and Resources in the area of  Data ScienceSome recent Research and Resources in the area of  Data Science
Some recent Research and Resources in the area of Data ScienceSheikh Rabiul Islam
 
Started from the Bottom: Exploiting Data Sources to Uncover ATT&CK Behaviors
Started from the Bottom: Exploiting Data Sources to Uncover ATT&CK BehaviorsStarted from the Bottom: Exploiting Data Sources to Uncover ATT&CK Behaviors
Started from the Bottom: Exploiting Data Sources to Uncover ATT&CK BehaviorsJamieWilliams130
 

Semelhante a Extracting, Aligning, and Linking Data to Build Knowledge Graphs (20)

Building Knowledge Graphs in DIG
Building Knowledge Graphs in DIGBuilding Knowledge Graphs in DIG
Building Knowledge Graphs in DIG
 
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...
 
Cloud computing and networking course: paper presentation -Data Mining for In...
Cloud computing and networking course: paper presentation -Data Mining for In...Cloud computing and networking course: paper presentation -Data Mining for In...
Cloud computing and networking course: paper presentation -Data Mining for In...
 
Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017
 
IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...
IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...
IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...
 
20220307 utah state dixon_class v15
20220307 utah state dixon_class v1520220307 utah state dixon_class v15
20220307 utah state dixon_class v15
 
Future of AI-powered automation in business
Future of AI-powered automation in businessFuture of AI-powered automation in business
Future of AI-powered automation in business
 
Fairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInFairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedIn
 
Learning the Semantics of Structured Data Sources
Learning the Semantics of Structured Data SourcesLearning the Semantics of Structured Data Sources
Learning the Semantics of Structured Data Sources
 
Automatic Hypernym Classification: Towards the Induction of ...
Automatic Hypernym Classification: Towards the Induction of ...Automatic Hypernym Classification: Towards the Induction of ...
Automatic Hypernym Classification: Towards the Induction of ...
 
Automatic Hypernym Classification: Towards the Induction of ...
Automatic Hypernym Classification: Towards the Induction of ...Automatic Hypernym Classification: Towards the Induction of ...
Automatic Hypernym Classification: Towards the Induction of ...
 
A Survey on Security and Privacy of Machine Learning
A Survey on Security and Privacy of Machine LearningA Survey on Security and Privacy of Machine Learning
A Survey on Security and Privacy of Machine Learning
 
Cyber Crimes & Cyber Forensics
Cyber Crimes & Cyber ForensicsCyber Crimes & Cyber Forensics
Cyber Crimes & Cyber Forensics
 
[2B1]검색엔진의 패러다임 전환
[2B1]검색엔진의 패러다임 전환[2B1]검색엔진의 패러다임 전환
[2B1]검색엔진의 패러다임 전환
 
DBpedia Framework - BBC Talk
DBpedia Framework - BBC TalkDBpedia Framework - BBC Talk
DBpedia Framework - BBC Talk
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석
 
Making the web of things
Making the web of thingsMaking the web of things
Making the web of things
 
2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pub2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pub
 
Some recent Research and Resources in the area of Data Science
Some recent Research and Resources in the area of  Data ScienceSome recent Research and Resources in the area of  Data Science
Some recent Research and Resources in the area of Data Science
 
Started from the Bottom: Exploiting Data Sources to Uncover ATT&CK Behaviors
Started from the Bottom: Exploiting Data Sources to Uncover ATT&CK BehaviorsStarted from the Bottom: Exploiting Data Sources to Uncover ATT&CK Behaviors
Started from the Bottom: Exploiting Data Sources to Uncover ATT&CK Behaviors
 

Mais de Craig Knoblock

Learning to Adapt to Sensor Changes and Failures
Learning to Adapt to Sensor Changes and FailuresLearning to Adapt to Sensor Changes and Failures
Learning to Adapt to Sensor Changes and FailuresCraig Knoblock
 
Automatic Spatio-temporal Indexing to Integrate and Analyze the Data of an Or...
Automatic Spatio-temporal Indexing to Integrate and Analyze the Data of an Or...Automatic Spatio-temporal Indexing to Integrate and Analyze the Data of an Or...
Automatic Spatio-temporal Indexing to Integrate and Analyze the Data of an Or...Craig Knoblock
 
Lessons Learned in Building Linked Data for the American Art Collaborative
Lessons Learned in Building Linked Data for the American Art CollaborativeLessons Learned in Building Linked Data for the American Art Collaborative
Lessons Learned in Building Linked Data for the American Art CollaborativeCraig Knoblock
 
Assigning semantic labels to data sources
Assigning semantic labels to data sourcesAssigning semantic labels to data sources
Assigning semantic labels to data sourcesCraig Knoblock
 
A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...Craig Knoblock
 
Building and Using a Knowledge Graph to Combat Human Trafficking
Building and Using a Knowledge Graph to Combat Human TraffickingBuilding and Using a Knowledge Graph to Combat Human Trafficking
Building and Using a Knowledge Graph to Combat Human TraffickingCraig Knoblock
 
From Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge
From Virtual Museums to Peacebuilding: Creating and Using Linked KnowledgeFrom Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge
From Virtual Museums to Peacebuilding: Creating and Using Linked KnowledgeCraig Knoblock
 
Semantics for Big Data Integration and Analysis
Semantics for Big Data Integration and AnalysisSemantics for Big Data Integration and Analysis
Semantics for Big Data Integration and AnalysisCraig Knoblock
 
A Semantic Approach to Retrieving, Linking, and Integrating Heterogeneous Ge...
A Semantic Approach to Retrieving, Linking, and  Integrating Heterogeneous Ge...A Semantic Approach to Retrieving, Linking, and  Integrating Heterogeneous Ge...
A Semantic Approach to Retrieving, Linking, and Integrating Heterogeneous Ge...Craig Knoblock
 
Discovering Alignments in Ontologies of Linked Data
Discovering Alignments in Ontologies of Linked DataDiscovering Alignments in Ontologies of Linked Data
Discovering Alignments in Ontologies of Linked DataCraig Knoblock
 

Mais de Craig Knoblock (10)

Learning to Adapt to Sensor Changes and Failures
Learning to Adapt to Sensor Changes and FailuresLearning to Adapt to Sensor Changes and Failures
Learning to Adapt to Sensor Changes and Failures
 
Automatic Spatio-temporal Indexing to Integrate and Analyze the Data of an Or...
Automatic Spatio-temporal Indexing to Integrate and Analyze the Data of an Or...Automatic Spatio-temporal Indexing to Integrate and Analyze the Data of an Or...
Automatic Spatio-temporal Indexing to Integrate and Analyze the Data of an Or...
 
Lessons Learned in Building Linked Data for the American Art Collaborative
Lessons Learned in Building Linked Data for the American Art CollaborativeLessons Learned in Building Linked Data for the American Art Collaborative
Lessons Learned in Building Linked Data for the American Art Collaborative
 
Assigning semantic labels to data sources
Assigning semantic labels to data sourcesAssigning semantic labels to data sources
Assigning semantic labels to data sources
 
A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...
 
Building and Using a Knowledge Graph to Combat Human Trafficking
Building and Using a Knowledge Graph to Combat Human TraffickingBuilding and Using a Knowledge Graph to Combat Human Trafficking
Building and Using a Knowledge Graph to Combat Human Trafficking
 
From Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge
From Virtual Museums to Peacebuilding: Creating and Using Linked KnowledgeFrom Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge
From Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge
 
Semantics for Big Data Integration and Analysis
Semantics for Big Data Integration and AnalysisSemantics for Big Data Integration and Analysis
Semantics for Big Data Integration and Analysis
 
A Semantic Approach to Retrieving, Linking, and Integrating Heterogeneous Ge...
A Semantic Approach to Retrieving, Linking, and  Integrating Heterogeneous Ge...A Semantic Approach to Retrieving, Linking, and  Integrating Heterogeneous Ge...
A Semantic Approach to Retrieving, Linking, and Integrating Heterogeneous Ge...
 
Discovering Alignments in Ontologies of Linked Data
Discovering Alignments in Ontologies of Linked DataDiscovering Alignments in Ontologies of Linked Data
Discovering Alignments in Ontologies of Linked Data
 

Último

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 

Último (20)

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 

Extracting, Aligning, and Linking Data to Build Knowledge Graphs

  • 1. Extracting, Aligning, and Linking Data to Build Knowledge Graphs Craig Knoblock University of Southern California Thanks to my collaborators: Pedro Szekely, Linhong Zhu, Majid Ghasemi-Gol, Mohsen Taheriyan, Minh Pham, and Steve Minton
  • 2. Goal USC Information Sciences Institute CC-By 2.0 2 raw  messy  disconnected clean  organized  linked hard to query, analyze & visualize easy to query, analyze & visualize
  • 3. Use Case: Human Trafficking USC Information Sciences Institute CC-By 2.0 3 raw  messy  disconnected clean  organized  linked hard to query, analyze & visualize easy to query, analyze & visualize
  • 4. Use Case: Human Trafficking USC Information Sciences Institute CC-By 2.0 4 100 million pages ~ 100 Web sites help victims prosecute traffickers
  • 5. Example: Investigating a Reported Victim San Diego, where else? USC Information Sciences Institute CC-By 2.0 5
  • 6. DIG Interface: Find the locations where a potential victim was advertised CC-By 2.0 6
  • 7. Steps To Build a KG USC Information Sciences Institute CC-By 2.0 7 Crawling Extraction DataAcquisition Mapping To Ontology Entity Linking &Similarity Knowledge Graph Deployment Query & Visualization Elastic Search Graph DB schema.org geonames Data Acquisition Feature Extraction Feature Alignment Entity Resolution Graph Construction User Interface Data Acquisition
  • 8. Data Acquisition USC Information Sciences Institute CC-By 2.0 8 downloading relevant data batch  real-time Web pages Web service  database  CSV  Excel  XML  JSON
  • 9. Traditional Web Crawler (e.g., Nutch, Scrapy) CC-By 2.0 9USC Information Sciences Institute
  • 11. Steps To Build a KG USC Information Sciences Institute CC-By 2.0 11 Crawling Extraction DataAcquisition Mapping To Ontology Entity Linking &Similarity Knowledge Graph Deployment Query & Visualization Elastic Search Graph DB schema.org geonames Data Acquisition Feature Extraction Feature Alignment Entity Resolution Graph Construction User Interface
  • 12. Feature Extraction USC Information Sciences Institute CC-By 2.0 12 from raw sources to structured data • extraction from text • extraction from structured Web pages • extraction of image features
  • 13. Extraction USC Information Sciences Institute CC-By 2.0 13
  • 15. Automated Extraction [Minton et al., Inferlink] • Title • Description • Seller • Post Date • Expiry Date • Price • Location • Category • Member Since • Num Views • Post ID USC Information Sciences Institute CC-By 2.0 15
  • 16. Automated Extraction Input: A Pile of Pages USC Information Sciences Institute CC-By 2.0 16
  • 17. Automated Extraction input: a pile of pages Classify by Templates pages clustered by template USC Information Sciences Institute CC-By 2.0 17
  • 18. Automated Extraction input: a pile of pages Classify by Templates pages clustered by template Infer Extractor Infer Extractor Infer Extractor Infer Extractor extractor USC Information Sciences Institute CC-By 2.0 18
  • 19. Unsupervised Extraction Tool USC Information Sciences Institute CC-By 2.0 19
  • 20. Pretty Good Extractions Want Extracted Extra Jan. 23, 2015 Jan. 23, 2015 expires Feb Partial Jan. 23, 2015 Jan. 23
  • 21. Extraction Evaluation Title Desc Seller Date Price Loc Cat Member Since Expires Views ID Perfect 1.0 (50/50) .76 (37/49) .95 (40/42) .83 (40/48) .87 (39/45) .51 (23/45) .68 (34/50) 1.0 (35/35) .52 (15/29) .76 (19/25) .97 (35/36) Pretty Good 1.0 (50/50) .98 (48/49) .95 (40/42) .83 (40/48) .98 (44/45) .84 (38/45) .88 (44/50) 1.0 (35/35) .55 (16/29) 1.0 (25/25) 1.0 (36/36) 10 websites, 5 pages each fields USC Information Sciences Institute CC-By 2.0 21
  • 22. Steps To Build a KG USC Information Sciences Institute CC-By 2.0 22 Crawling Extraction DataAcquisition Mapping To Ontology Entity Linking &Similarity Knowledge Graph Deployment Query & Visualization Elastic Search Graph DB schema.org geonames Data Acquisition Feature Extraction Feature Alignment Entity Resolution Graph Construction User Interface
  • 23. Feature Alignment USC Information Sciences Institute CC-By 2.0 23 from multiple schemas to a common domain schema - CSV, Excel - Database tables - Web services - Extractors - Nomenclature - Spelling Multiple Schemas
  • 24. Karma: Mapping Data to Ontologies Services Relational Sources Karma { JSON-LD } Hierarchical Sources Schema.org USC Information Sciences Institute CC-By 2.0 24
  • 25. Semantic Labeling [Pham et al., ISWC’16] Offer Place Person name price idname Offer Column-1 Column-2 Column-3 Column-4 British Lee-Enfield No 4 MK 2 still … 1,000 68155c13de2f2532 Cabelas Millenium Revolver in .45 colt 700 1711 Anderson Rd 12155a1a2938bc1 e
  • 26. Learning Semantic Types Requirements: Learn from a small number of examples Distinguish both string and numeric values Can be learned quickly and is highly scalable to large numbers of semantic types Person OrganizationCity State name birthdate name namename Person name date city state workplace 1 Fred Collins Oct 1959 Seattle WA Microsoft 2 Tina Peterson May 1980 New York NY Google Domain Ontology
  • 27. Textual Data Learning Semantic Types Textual Data Treat each column of data as a document Apply TF-IDF Cosine Similarity
  • 28. Numeric Data Learning Semantic Types Numeric Data: Apply statistical hypothesis testing to determine which distribution fits best Apply Kolmogorov-Smirnov Test
  • 29. Features for Semantic Labeling • Features – KS = Kolmogorov-Smirnov – MW = Mann-Whitney CC-By 2.0 29USC Information Sciences Institute
  • 30. Combining the Features for Semantic Labeling CC-By 2.0 30USC Information Sciences Institute
  • 31. Automatically Assigned Semantic Labels Offer name CreativeWork fragment Offer description Offer identifier Offer datePosted CreativeWork Fragment 35 Whelen Handi-Rifle No Tags 35 Whelen Handi-rifle. Black synthetic stock/forearm, blued barrel. Text 601-813-7280 …. 245625390711756 October 19, 2015 12:43 pm Cabelas Millenium Revolver in .45 colt No Tags This single action is built to shoot and is a great way for any level of shooter to get involved with a single action. … 12155a1a2938bc1e July 11, 2015 5:17 pm 1711 Anderson Rd swap stocks No Tags want to trade butler creek folding stock for black stock ruger mini stock folder by butler creek will swap even for full rifle stock …. 5815600fd181fe3b September 22, 2015 1:05 am white streetAddress does not appear in training data -> more similar to noisy data
  • 32. Results on www.msguntrader.com number of attributes 19 Correct prediction 16 Correct label is in the top 4 predictions 18 Accuracy 84% MRR 89%
  • 33. Results on Gun Sites Evaluation Dataset Average number of attributes 18 Total number of attributes 176 Correct prediction (Accuracy) 56% Correct label is in the top 4 predictions 89% MRR 70%
  • 34. Steps To Build a KG USC Information Sciences Institute CC-By 2.0 34 Crawling Extraction DataAcquisition Mapping To Ontology Entity Linking &Similarity Knowledge Graph Deployment Query & Visualization Elastic Search Graph DB schema.org geonames Data Acquisition Feature Extraction Feature Alignment Entity Resolution Graph Construction User Interface
  • 35. Entity Resolution USC Information Sciences Institute CC-By 2.0 35 merging records that refer to the same entity missing data incorrect data scale (~100 million records) techniques to address
  • 36. Unsupervised Collective Entity Resolution 36 USC Information Sciences Institute
  • 37. same victim same Trafficker Unsupervised Collective Entity Resolution USC Information Sciences Institute CC-By 2.0 37
  • 38. Collective Entity Resolution [Zhu et al, ISWC’16] Identifying and linking instances of the same real world entity Quiet Comfort 25 Noise Cancelling Headphone Bose Electroni c Product 1 Noise Cancelling Headphones Product 2 292 Premium Noise Cancelling Headphones Son y Product 3 599 Dish Washer Bosch Product 4 229 Bose Noise Cancelling Headphones Bos e Product 5 299 price description manufacturerproduct Multi-Type Graph
  • 39. Quiet Comfort 25 Noise Cancelling Headphone Bose Electroni c Product 1 Noise Cancelling Headphones Product 2 292 Premium Noise Cancelling Headphones Son y Product 3 599 Dish Washer Bosch Product 4 229 Bose Noise Cancelling Headphones Bos e Product 5 299 price description manufacturerproduct Multi-Type Graph Collective Entity Resolution [Zhu et al, ISWC’16] Identifying and linking instances of the same real world entity
  • 40. Common Approach: Pairwise Comparisons Product 5 299 Quiet Comfort 25 Noise Cancelling Headphone Bose Electronic 299, 229 Bose Noise Cancelling HeadphonesBoseProduct 4 599 Dish WasherBoschProduct 3 292 Premium Noise Cancelling HeadphonesSonyProduct 2 Noise Cancelling HeadphonesSonyProduct 1 Price TitleManufacturer Jaro 0.5 distance 0.2 Jaccard 0.3 Acceptance Threshold: 0.8
  • 41. Missing Values Product 5 299 Quiet Comfort 25 Noise Cancelling Headphone Bose Electronic 299, 229 Bose Noise Cancelling HeadphonesBoseProduct 4 599 Dish WasherBoschProduct 3 292 Premium Noise Cancelling HeadphonesSonyProduct 2 Noise Cancelling HeadphonesSonyProduct 1 Price TitleManufacturer Jaro 0.5 distance 0.2 Jaccard 0.3
  • 42. Multiple Values Product 5 299 Quiet Comfort 25 Noise Cancelling Headphone Bose Electronic 299, 229 Bose Noise Cancelling HeadphonesBoseProduct 4 599 Dish WasherBoschProduct 3 292 Premium Noise Cancelling HeadphonesSonyProduct 2 Noise Cancelling HeadphonesSonyProduct 1 Price TitleManufacturer Jaro 0.5 distance 0.2 Jaccard 0.3
  • 43. Weights Product 5 299 Quiet Comfort 25 Noise Cancelling Headphone Bose Electronic 299, 229 Bose Noise Cancelling HeadphonesBoseProduct 4 599 Dish WasherBoschProduct 3 292 Premium Noise Cancelling HeadphonesSonyProduct 2 Noise Cancelling HeadphonesSonyProduct 1 Price TitleManufacturer Jaro 0.5 distance 0.2 Jaccard 0.30.5 0.2 0.3
  • 44. Unidirectional Product 5 299 Quiet Comfort 25 Noise Cancelling Headphone Bose Electronic 299, 229 Bose Noise Cancelling HeadphonesBoseProduct 4 599 Dish WasherBoschProduct 3 292 Premium Noise Cancelling HeadphonesSonyProduct 2 Noise Cancelling HeadphonesSonyProduct 1 Price TitleManufacturer Jaro 0.5 distance 0.2 Jaccard 0.30.5 0.2 0.3
  • 45. Graph Summarization: Original Graph Quiet Comfort 25 Noise Cancelling Headphone Bose Electroni c Product 1 Noise Cancelling Headphones Product 2 292 Premium Noise Cancelling Headphones Son y Product 3 599 Dish Washer Bosch Product 4 229 Bose Noise Cancelling Headphones Bos e Product 5 299 price description manufacturerproduct
  • 47. Quiet Comfort 25 Noise Cancelling Headphone Bose Electroni c Product 1 Noise Cancelling Headphones Product 2 292 Premium Noise Cancelling Headphones Son y Product 3 599 Dish Washer Bosch 229 Bose Noise Cancelling Headphones Bos e Product 5 299 Product 4 Graph Sumarization: Super-Nodes
  • 48. Quiet Comfort 25 Noise Cancelling Headphone Noise Cancelling Headphones Premium Noise Cancelling Headphones Dish Washer Bose Noise Cancelling Headphones Super-nodes Ct(x) 0.7 0.2 0.1 0.7 0.2 0.1 0.2 0.7 0.1 0.2 0.7 0.1 0.1 0.1 0.8 probability that a node x belongs to each super-node one matrix for each type Ct
  • 49. Noise Cancelling Headphones Premium Noise Cancelling Headphones Dish Washer Quiet Comfort 25 Noise Cancelling Headphone Bose Noise Cancelling Headphones Similar Nodes Should Be In The Same Super-Node
  • 53. Bose Electroni c Product 3 Bosch Bos e Product 5 Product 4 Predict Links In Original Graph Bose Electroni c Product 3 Bosch Bos e Product 5 Product 4
  • 54. Predict Links In Original Graph Bose Electroni c Product 3 Bosch Bos e Product 5 Product 4
  • 56. Comparable Approaches Pairwise Clustering Unsupervised Supervised Limes, Ngomo’11 ✔ ✔ SILK, Isele’10 ✔ ✔ ✔ Serf, Benjelloun’10 ✔ ✔ *Commercial, Kӧpcke’10 ✔ ✔ GraphSum, Riondato’14 ✔ ✔ *AuthorLDA, Bhattacharya’07 ✔ ✔ CoSum (proposed) ✔ ✔
  • 57. Quality Comparison Precision Recall F-measure Author Paper Product Author Paper Product Author Paper Product Limes-F 0.958 0.827 0.446 0.864 0.761 0.16 0.909 0.792 0.236 Silk-F 0.846 0.877 0.459 0.986 0.756 0.348 0.91 0.812 0.395 Gsum 0.727 0.668 0.01 0.569 0.624 0.587 0.638 0.645 0.02 CoSum-B 0.993 0.871 0.58 0.94 0.611 0.477 0.966 0.718 0.524 Limes-MO 0.912 0.827 0.446 0.944 0.761 0.16 0.928 0.792 0.236 Silk-MO 0.932 0.877 0.459 0.958 0.756 0.348 0.945 0.812 0.395 Serf 0.985 0.837 0.436 0.687 0.808 0.186 0.809 0.822 0.261 CoSum-P 0.999 0.771 0.639 0.997 0.997 0.695 0.998 0.87 0.666 Commercial 0.615 0.63 0.622 AuthorLDA 0.995
  • 58. Steps To Build a KG USC Information Sciences Institute CC-By 2.0 58 Crawling Extraction DataAcquisition Mapping To Ontology Entity Linking &Similarity Knowledge Graph Deployment Query & Visualization Elastic Search Graph DB schema.org geonames Data Acquisition Feature Extraction Feature Alignment Entity Resolution Graph Construction User Interface
  • 59. Graph Construction USC Information Sciences Institute CC-By 2.0 59 assembling the data for efficient query & analysis - ElasticSearch: scalable, efficient query - graph databases: network analytics - NoSQL: scalable analytics - bulk loading: massive data imports - real-time updates: live, changing data
  • 60. elasticsearch • Cloud-based search engine • Based on Apache Lucene • Horizontal scaling, replication, load balancing • Blazingly fast! • Everything is a document – Documents are JSON objects – Index what you want to find – Fields can contain strings, numbers, booleans, etc. CC-By 2.0 60USC Information Sciences Institute
  • 61.
  • 62. Adult Service Offer Person Efficient indexing and query Phone Web Page ElasticSearch Data Model
  • 65. Indexing for High Performance Knowledge Graph Queries Avg. Query Times in Milliseconds Single User Query Load 1.2 billion triples State of the Art Graph Database (RDF) DIG indexing deployed in ElasticSearch USC Information Sciences Institute CC-By 2.0 65
  • 66. Steps To Build a KG USC Information Sciences Institute CC-By 2.0 66 Crawling Extraction DataAcquisition Mapping To Ontology Entity Linking &Similarity Knowledge Graph Deployment Query & Visualization Elastic Search Graph DB schema.org geonames Data Acquisition Feature Extraction Feature Alignment Entity Resolution Graph Construction User Interface
  • 67.
  • 68. DIG Deployment for Human Trafficking USC Information Sciences Institute CC-By 2.0 68 - 100 million Web pages - Live updates (~5,000 pages/hour) - ElasticSearch database (7 nodes) - Hadoop workflows (20 nodes) - District Attorney - Law Enforcement - NGOs
  • 69. DIG Applications Human Trafficking large, real users Material Science Research 70,000 paper abstracts (built in 1 week) Arms Trafficking identify illegal sales Patent Trolls identifies patent trolls Predicting Cyber Attacks combines diverse sources about vulnerabilities, exploits, etc. CC-By 2.0 69USC Information Sciences Institute
  • 70. Conclusions • Presented the end-to-end tool-chain to build domain-specific knowledge graphs • Integrates heterogeneous data: web pages, databases, CSV, web APIs, images, etc. • Approach scales to million of pages, and billions facts • Has been used to build real-world deployed applicationsUSC Information Sciences Institute CC-By 2.0 70

Notas do Editor

  1. Karma offers suggestions on how to do the mapping
  2. Tokenize values in a given labeled column into pure alphabetic, numeric and symbol tokens Extract features from the tokens and the column name and associate them with column’s semantic type
  3. Why is linking significant in this domain? Slide shows why.