Applying Noisy Knowledge Graphs to Real Problems

Applying Noisy Knowledge
Graphs to Real Problems
Mayank Kejriwal
USC Information Sciences Institute
May 2019

Web has lowered the barrier to entry!
5

7
Pump and dump schemes proliferate online

Quechua
Fula
Odiya
Maithili Bhojhpuri
Uighyur
Mayan languages
Aboriginal
languages
Tasmanian
languages
Fang
Umbundu
Setswana
Afro-Asiatic
Khoisan Fon
Yoruba
Peulh
Adangame
Erzya Bashkir
Khakas
Udmurt
Ingush
Tagalog
Hilgaynon
Bikol
Waray
Native American
dialects

What do these problems have in common
(besides being really hard)?

1. Very messy, raw data, with both
redundancy and irrelevance
2. Users are also producers i.e. we cannot
just ‘build’ the system and hand it off
3. Domains are largely non-analytic (e.g.,
we don’t have a model/equations for
human trafficking)

19
Space of design decisions
Raw data
Search+GUI
?
??
Representation +
Infrastructure
? ? ?

Search+GUI
Producer Consumer

23
Raw data
Search+GUI
Knowledge
Graph
??
Representation +
Infrastructure
? ? ?

24
Domain-specific
Insight Graphs
(DIG)

25
Space of design decisions: example from human trafficking
Raw data
Search+GUI
Knowledge
Graph (KG)
Domain
discovery
Define KG
schema
Representation +
Infrastructure
Flexible
inputs
Query
reformulation
KG
Construction

The Knowledge Graph is noisy…how do
we cope?

Answer: Strategize around each triangle
Search+GUI
ConsumerProducer

Example from DIG: consumer triangle
Search+GUI
Consumer

31
Anti-fragile query reformulation to satisfy user intent
SELECT ?ad ?ethnicity
WHERE
{
?ad a :Ad ;
:hair_color 'Auburn' ;
:review_site_id 'cg9469f'
;
:price_per_hour '500' ;
:name ’Claire Gold’ ;
:ethnicity ?ethnicity .
}
query 1
query 2
query 3
query 4
query n
Query
Reformulation
Keyword expansion • Context broadening • Constraint
relaxation
Precision
Recall
Elastic Search
100M entities
Ranked
Candidates

32
Query-centric KG representation

33
Infrastructure: Leverage existing ecosystems (there
are many!)

38
Other domains
Narcotics
Illegal weapons
sales
Fraudulent shipments
Securities fraud
Causal exploration
Geopolitical forecasting
Cyberattack
prediction

THOR: Text-enabled Humanitarian Operations in
Real-time

45
Controlled (i.e. academic measurements)
0
10
20
30
40
50
60
70
80
90
100
0 - 0.1 < 0.2 < 0.3 < 0.4 < 0.5 < 0.6 < 0.7 < 0.8 < 0.9 <= 1.0
Average Precision of Retrieved Pages
DARPA MEMEX Eval (90K pages)
Point Fact Cluster ID Aggregate Facet
%Questions
Average Precision

46
In-use impact (sex trafficking)
100 million+ escort ads
3 years data coverage
2 billion triples
100 law enforcement
offices
3 convictions

47
NY County District Attorney (HTRU)
MEMEX tools getting
rolled out

48
Memex tools getting rolled out

49
Academic Output
~15 publications over the course of the program
• 7 more currently under review
• 2 best paper awards
• Upcoming special issue call on knowledge construction and management
• 2 upcoming books, incl. graduate-level textbook on knowledge graphs (MIT Press, 2018)
Multiple tutorials/demonstrations at top-tier academic conferences
• Tutorials on knowledge graph construction and data mining over Web corpora/unusual domains in
KDD17, ISWC17, AAAI18, WWW18
• At ISWC17, only full-day tutorial accepted; had near-capacity attendance
• Demos at ISWC17, AAAI18 (nominated for Best Demo)
• Case study at CHI18
Selected papers
• Knowledge Graphs for Social Good: An Entity-centric Search Engine for the Human Trafficking Domain
(IEEE Transactions on Big Data, 2017)
• Information Extraction in Illicit Domains (WWW, 2017)
• Unsupervised Entity Resolution on Multi-type Graphs (ISWC 2016)

50
Broker
Rich club
effectStar cluster
Web formation
Social Science Studies

51
• Subjective issue
• Architecture-level
evaluation
• Ablation analysis
“Ideal” Evaluation

53
Raw data Search+GUI
?
??
? ?

54
Domain-specific
Insight Graphs
(DIG)

55
Structured query execution on noisy data
SELECT ?ad ?ethnicity
WHERE
{
?ad a :Ad ;
:hair_color 'Auburn' ;
:review_site_id 'cg9469f'
;
:price_per_hour '500' ;
:name ’Claire Gold’ ;
:ethnicity ?ethnicity .
}
query 1
query 2
query 3
query 4
query n
Query
Reformulation
Keyword expansion • Context broadening • Constraint
relaxation
Precision
Recall
Elastic Search
100M entities
Ranked
Candidates

56
DIG capabilities
Aggregations
Facets
Dossier Generation
Networks
Provenance
Structured Queries
Interface Customization
• Capabilities that generic search
engines like Google do not
currently support
• Domain-specific
–Allows a user to specify her schema
–No prior constraints
• Insight
–Supports aggregations, network
analysis, faceted search, dossiers...
• Graph
–Uses a knowledge graph
representation + efficient NoSQL
query reformulation

Users want Situational Awareness i.e.
equipped with actionable insights

• Advanced name matching algorithm based on
machine learning, phonetic similarity and illicit
webpage-specific word embeddings
58
How can we tell when two actors are really one and the same?
Abbie
Candy
Kim
Lea
Nicki
Abby
Kandy
Kimmy
Leah
Nikki

• Evaluated on five investigative domains beyond human trafficking,
each with its own domain-specific needs
–Narcotics
–Counterfeit Electronics Manufacturing
–Securities Fraud
–Mail Shipment Fraud
–Illegal Weapons Sales
• User engagement was high
–Investigators were able to customize their domain in just one day, with less
than an hour of training
–Have expressed interest in continuing to refine and use the search engine
internally
59
Other use-cases

Relevance score
Matching search
criteria
highlighted
Image
extraction+face
and pose
analytics using
deep learning
Original URL

Dossier term
Activity timeline
Co-occurrence
statistics
Related ads

Applying Noisy Knowledge Graphs to Real Problems

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Applying Noisy Knowledge Graphs to Real Problems

Semelhante a Applying Noisy Knowledge Graphs to Real Problems (20)

Mais de DataWorks Summit

Mais de DataWorks Summit (20)

Último

Último (20)

Applying Noisy Knowledge Graphs to Real Problems

Notas do Editor