This talk has been given at the 13th International Conference on Principles of Knowledge Representation and Reasoning (KR 2012) to be held in Rome, Italy, June 10-14, 2012 by Ilias Tahmazidis (FORTH).
Abstract:
We are witnessing an explosion of available data from the Web, government authorities, scientific databases, sensors and more. Such datasets could benefit from the introduction of rule sets encoding commonly accepted rules or facts, application- or domain-specific rules, commonsense knowledge etc. This raises the question of whether, how, and to what extent knowledge representation methods are capable of handling the vast amounts of data for these applications. In this paper, we consider nonmonotonic reasoning, which has traditionally focused on rich knowledge structures. In particular, we consider defeasible logic, and analyze how parallelization, using the MapReduce framework, can be used to reason with defeasible rules over huge data sets. Our experimental results demonstrate that defeasible reasoning with billions of data is performant, and has the potential to scale to trillions of facts.
2. Huge data set coming from
◦ the Web, government authorities, scientific
databases, sensors and more
Defeasible logic
◦ is suitable for encoding commonsense knowledge
and reasoning
◦ avoids triviality of inference due to low-quality data
Defeasible logic has low complexity
◦ The consequences of a defeasible theory D can be
computed in O(N) time, where N is the number of
symbols in D
3. Reasoning is performed in the presence of
defeasible rules
Defeasible logic has been implemented for
in-memory reasoning, however, it is not
applicable for huge data set
Solution: scalability/parallelization using the
MapReduce framework
Our approach is restricted to single-
argument reasoning
4. A multi-argument implementation has been
accepted in ECAI 2012 (to appear)
Ilias Tachmazidis, Grigoris Antoniou, Giorgos
Flouris, Spyros Kotoulas, and Lee
McCluskey, ‘Large-scale Parallel Stratified
Defeasible Reasoning’, in ECAI, (2012)
5. Facts
◦ e.g. bird(eagle)
Strict Rules
◦ e.g. bird(X) animal(X)
Defeasible Rules
◦ e.g. bird(X) flies(X)
6. • Priority Relation (acyclic relation on the set of
rules)
–e.g. r: bird(X) flies(X)
r’: brokenWing(X) ¬ flies(X)
r’ > r
7. Inspired by similar primitives in LISP and
other functional languages
Operates exclusively on <key, value> pairs
Input and Output types of a MapReduce job:
◦ Input : <k1, v1>
◦ Map(k1,v1) → list(k2,v2)
◦ Reduce(k2, list (v2)) → list(k3,v3)
◦ Output : list(k3,v3)
8. Provides an infrastructure that takes care of
◦ distribution of data
◦ management of fault tolerance
◦ results collection
For a specific problem
◦ developer writes a few routines which are following
the general interface
9. Rule decomposition
◦ the computation of each rule is assigned to a
computer in the cloud
◦ difficult to achieve balanced work distribution
Data decomposition
◦ a subset of data is assigned to each computer in
the cloud
◦ provides more fine-grained partitioning
◦ our solution is based on data decomposition
10. Rule set:
◦ r1 : bird(X) animal(X)
◦ r2 : bird(X) flies(X)
◦ r3 : brokenWing(X) ¬ flies(X)
◦ r3 > r2
Consider bird(eagle) and brokenWing(owl) as
facts
Note that flies(eagle) and ¬ flies(owl) are not
conflicting with each other!
Reasoning is performed, in isolation, for each
unique argument value
16. This work is the first to explore the feasibility
of nonmonotonic reasoning over huge data
sets
We considered nonmonotonic reasoning in
the form of defeasible logic and adapted the
MapReduce framework for parallelization
Our experimental results demonstrate that
◦ defeasible reasoning with billions of data is
performant
◦ our approach has the potential to scale to trillions
of facts.