Horn Concerto – AKSW Colloquium

220 visualizações

Publicada em

Preliminary results of an ongoing work titled "Efficient Rule Mining on RDF Data". University of Leipzig, AKSW Colloquium, April 3rd, 2017.

Publicada em: Dados e análise
0 comentários
0 gostaram
Estatísticas
Notas
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Sem downloads
Visualizações
Visualizações totais
220
No SlideShare
0
A partir de incorporações
0
Número de incorporações
5
Ações
Compartilhamentos
0
Downloads
1
Comentários
0
Gostaram
0
Incorporações 0
Nenhuma incorporação

Nenhuma nota no slide

Horn Concerto – AKSW Colloquium

  1. 1. HORN CONCERTO Efficient Rule Mining on RDF Data Tommaso Soru AKSW Colloquium, 03.04.2017
  2. 2. RDF Rule Mining • RDFS/OWL rules are given as schema. • Schema-free datasets might have an implicit schema. • Why “mining”? Because rules are not visible in data. 2
  3. 3. Motivation/1 Link Prediction Problem.
 
 Given a union of graphs G = G1 ∪ … ∪ GN,
 find new edges among vertices s and t in G. 3
  4. 4. Motivation/2 Markov Logic Networks.
 
 Given a collection of first-order statements (evidence) and a set of weighted first-order rules, build an undirected weighted graph where nodes are statements and edges indicate dependency. 4
  5. 5. Motivation/3 RDFS/OWL Interpretation Rule Mining Weight Learning Grounding Inference Input Dataset(s) Predicted TriplesMANDOLIN’s pipeline Markov Logic Networks 5
  6. 6. Rule Mining & Weight Learning Given a directed labelled graph, find rules
 and weights associated with them. w rule w1 p(x, y) ← q(x, y) w2 p(x, y) ← q(y, x) w3 p(x, y) ← q(x, z) ^ r(y, z) w4 p(x, y) ← q(x, z) ^ r(z, y) w5 p(x, y) ← q(z, x) ^ r(y, z) w6 p(x, y) ← q(z, x) ^ r(z, y) Horn Clauses 6
  7. 7. Horn Clauses A Horn clause is a clause (a disjunction of literals patterns)
 with at most one positive, i.e. unnegated, literal pattern. p ∨ ¬q1 ∨ ¬q2 ∨ ... ∨ ¬qn p ← q1 ∧ q2 ∧ ... ∧ qn head body p(x,y) 7
  8. 8. Confidence score (weight) The confidence score of a rule is defined as the rate of
 the occurrences of head and body together
 over
 the occurrences of the body. p ← q1 ∧ q2 ∧ ... ∧ qn head body 8
  9. 9. The HORN CONCERTO approach P(A | B) = P(A∩ B) P(B) Bayes’ Theorem p ← q1 ∧ q2 ∧ ... ∧ qn Horn Clause Event-based confidence score P(p | ! q) = P(p ∩ ! q) P( ! q) ≈ p ∧ ! q{ } ! q{ } 9
  10. 10. Rules with p ∧ ! q{ } ! q{ } SELECT ?p (COUNT(*) AS ?c) WHERE { ?x ?p ?y . ?x <target_q> ?y . FILTER(?p != <target_q> ) } GROUP BY ?p SELECT ?q (COUNT(*) AS ?c) WHERE { [] ?q [] } GROUP BY ?q ! q = 1 p(x, y) ← q(x, y) 10
  11. 11. Rules with p ∧ ! q{ } ! q{ } SELECT ?q ?r (COUNT(*) AS ?c) WHERE { ?x ?q ?z . ?z ?r ?y . ?x <target_p> ?y } GROUP BY ?q ?r SELECT (COUNT(*) AS ?c) WHERE { ?x <target_q> ?z . ?z <target_r> ?y } ! q = 2 p(x, y) ← q(x, z) ∧ r(z, y) 11
  12. 12. Optimizations • Select only top N properties. • Order by descending score and prune when it’s lower than a threshold T. • Cache scores in-memory, as there might exist p1, p2 such that: pi(x,y) ← q(?,?), r(?,?). • Parallelize algorithm by rule type. 12
  13. 13. Evaluation Setup • 8 CPUs, 32 GB RAM, Ubuntu 16.04 • Scalability study DBpedia Person (7 million triples) DBpedia 2016-04 (397 million triples) FUTURE • Rule effectiveness for link prediction FB15k (592 thousand triples) WN18 (151 thousand triples) • Rule quality (human judgment?) 13
  14. 14. Preliminary results – DBpedia Person Runtime (s) # Rules Used RAM (GB) AMIE+ > 10 days 6,337 4 Ontological PF > 3 hours > 1,000 4 HORN CONCERTO single-thread 1.7 hours 3,125 client: 0.2 server: 1.0 14
  15. 15. Preliminary results – DBpedia 2016-04 Runtime (s) # Rules Used RAM (GB) HORN CONCERTO single-thread 11 hours 887 client: 0.2 server: N/A 15
  16. 16. Discussion • AMIE+ Cons: indexes the graph in-memory. • Ontological PF Pros: Very fast Cons: Relies on schema data (types domain, range) • Horn Concerto Pros: Works with SPARQL endpoint, fast also single-threaded,
 may be able to overperform Ontological PF in RDF datasets with available schema. 16
  17. 17. Thank you.

×