Efficient Tabling of Structured Data Using Indexing and Program Transformation
1. Efficient Tabling of Structured Data
Using Indexing and Program Transformation
Christian Theil Have and Henning Christiansen
Research group PLIS: Programming, Logic and Intelligent Systems
Department of Communication, Business and Information Technologies
Roskilde University, P.O.Box 260, DK-4000 Roskilde, Denmark
PADL in Philadelphia, January 23, 2012
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
2. Outline
1 Motivation and background
The trouble with tabling of structured data
2 A workaround implemented in Prolog
Examples
Example: Edit Distance
Example: Hidden Markov Model in PRISM
3 An automatic program transformation
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
3. Motivation and background
Motivation
In the LoSt project we explore the use of Probabilistic Logic
programming (with PRISM) for biological sequence analysis.
We had some problems analyzing very long sequences..
.. and identified the culprit: Tabling of structured data.
Inspired by earlier work by Christiansen and Gallagher which address a
similar problem related to tabling non-discrimininatory arguments.
Henning Christiansen, John P. Gallagher
Non-discriminating Arguments and their Uses.
ICLP 2009.
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
4. Motivation and background The trouble with tabling of structured data
The trouble with tabling of structured data
Tabling in logic programming is an established technique which,
can give a significant speed-up of program execution.
make it easier to write efficient programs in a declarative style.
is similar to memoization in functional programming.
The idea:
The system maintains a table of calls and their answers.
when a new call is entered, check if it is stored in the table
if so, use previously found solution.
Tabling is included in several recognized Prolog systems such as B-Prolog,
YAP and XSB.
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
5. Motivation and background The trouble with tabling of structured data
The trouble with tabling of structured data
An innocent looking call: last([1,2,3,4,5],X)
predicate: last/2 last([1,2,3,4,5],X)
last([X],X). last([1,2,3,4],X)
last([_|L],X) :- last([1,2,3],X)
last(L,X). last([1,2],X)
last([1],X)
Traverses a list to find
the last element. call table
Time/space complexity: last([1,2,3,4,5],X).
O(n). last([1,2,3,4],X).
last([1,2,3],X).
If we table last/2:
last([1,2],X).
n + n − 1 + n − 2...1
last([1],X).
≈ O(n2 ) !
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
6. Motivation and background The trouble with tabling of structured data
Wait a minute..
Tabling systems do employ some advanced techniques to avoid the
expensive copying and which may reduce memory consumption and/or
time complexity.
For instance,
B-Prolog uses hashing of goals.
XSB uses a trie data structure.
YAP uses a trie structure, which is refined into a so-called global trie
which applies a sharing strategy for common subterms whenever
possible.
These techniques may reduce space consumption, but if there is no sharing
between the tables and the actual arguments of an active call, each
execution of a call may involve a full traversal (naive copying) of its
arguments.
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
7. Motivation and background The trouble with tabling of structured data
Benchmarking last/2 for B-Prolog, Yap and XSB
To investigate, we benchmarked the tabled version of last/2 for each of
the major Prolog tabling engines.
To further investigate whether performance is data dependent, we
benchmarked with both repeated data and random data.
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
8. Motivation and background The trouble with tabling of structured data
Space usage, tabled last/2 (1)
b) Space usage
2000000
●
● XSB (random data) ●
● B−Prolog (random data) ●
Yap (random data)
●
XSB (repeated data)
1500000
B−Prolog (repeated data) ●
Yap (repeated data)
space usage (kilobytes)
●
●
●
1000000
●
●
●
●
●
●
500000
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0 500 1000 1500 2000 2500 3000
list length (N)
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
9. Motivation and background The trouble with tabling of structured data
Space usage, tabled last/2 (2)
d) Space usage expanded for the four lower curves
10000
● B−Prolog (random data)
XSB (repeated data)
B−Prolog (repeated data)
8000
Yap (repeated data)
space usage (kilobytes)
6000
4000
2000
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
0
0 5000 10000 15000
list length (N)
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
10. Motivation and background The trouble with tabling of structured data
Time usage, tabled last/2 (1)
a) Time usage
●
4
● XSB (random data) ●
● B−Prolog (random data) ●
Yap (random data)
●
XSB (repeated data)
B−Prolog (repeated data) ●
3
Yap (repeated data) ●
time usage (seconds)
●
●
●
●
2
●
●
●
●
●
●
1
●
●
●
● ●
●
● ●
●
●
● ●
● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0
0 500 1000 1500 2000 2500 3000
list length (N)
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
11. Motivation and background The trouble with tabling of structured data
Time usage, tabled last/2 (2)
●
c) Time usage expanded for the three lower curves
●
1.0
●
● B−Prolog (random data)
XSB (repeated data) ●
Yap (repeated data) ●
0.8
●
●
time usage (seconds)
●
0.6
●
●
●
●
●
0.4
●
●
●
●
●
●
●
●
●
●
0.2
●●
●
●●
● ●
●
●●
●
●●
● ●●
●●
●●●
●●●●
0.0
0 10000 20000 30000 40000 50000
list length (N)
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
12. A workaround implemented in Prolog
A workaround implemented in Prolog
We describe a workaround giving O(1) time and space complexity for table
lookups for programs with arbitrarily large ground structured data as input
arguments.
A term is represented as a set of facts.
A subterm is referenced by a unique integer serving as an abstract
pointer.
Matching related to tabling is done solely by comparison of such
pointers.
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
13. A workaround implemented in Prolog
An abstract data type
The representation is given by the following predicates which all together
can be understood as an abstract datatype.
store term( +ground-term, pointer )
The ground-term is any ground term, and the pointer
returned is a unique reference (an integer) for that term.
retrieve term( +pointer , ?functor , ?arg-pointers-list)
Returns the functor and a list of pointers to representations
of the substructures of the term represented by pointer.
full retrieve term( +pointer , ?ground-term)
Returns the term represented by pointer.
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
14. A workaround implemented in Prolog
ADT properties
Property 1
It must hold for any ground term s, that the query
store term(s, P), full retrieve term(P, S),
assigns to the variable S a value identical to s.
Property 2
It must hold for any ground term s of the form f (s1 , . . . ,sn ) that
store term(s, P), retrieve term(P, F, Ss),
assigns to the variable F the symbol f , and to Ss a list of ground values
[p1 ,. . .,pn ] such that additional queries
full retrieve term(pi , Si ), i = 1, . . . , n
assign to the variables Si values identical to si .
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
15. A workaround implemented in Prolog
ADT example
Example
The following call converts the term f(a,g(b)) into its internal
representation and returns a pointer value in the variable P.
store_term(f(a,g(b)),P).
After this, the following sequence of calls will succeed.
retrieve_term(P,f,[P1,P2]),
retrieve_term(P1,a,[]),
retrieve_term(P2,g,[P21]),
retrieve_term(P21,b,[]),
full_retrieve_term(P,f(a,g(b))).
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
16. A workaround implemented in Prolog
Implementation with assert
One possible way of implementing the predicates introduced above is to
have store term/2 asserting facts for the retrieve term/3 predicate
using increasing integers as pointers.
Example
The call store term(f(a,g(b)),P) considered in example 1 may assign
the value 100 to P and as a side-effect assert the following facts.
retrieve_term(100,f,[101,102]).
retrieve_term(101,a,[]).
retrieve_term(102,g,[103]).
retrieve_term(103,b,[]).
Notice that Prolog’s indexing on first arguments ensures a constant lookup
time.
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
17. A workaround implemented in Prolog
Another level of abstraction: lookup pattern/2
Finally we introduce a utility predicate which may simplify the use of the
representation in application programs. It utilizes a special kind of terms,
called patterns, which are not necessarily ground and which may contain
subterms of the form lazy(variable).
lookup pattern( +pointer , +pattern)
The pattern is matched in a recursive way against the term
represented by the pointer p in the following way.
– lookup pattern(p,X) is treated as
full retrieve term(p,X).
– lookup pattern(p,lazy(X)) unifies X with p.
– For any other pattern =.. [F,X1 ,. . . ,Xn ] we call
retrieve term(p, F, [P1 ,. . .,Pn ])
followed by lookup pattern(Pi ,Xi ), i = 1, . . . , n.
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
18. A workaround implemented in Prolog
A lookup pattern/2 example
Example
Continuing the previous example, we get that after the call
store term(f(a,g(b)),P).
lookup_pattern(100, f(X,lazy(Y)))
leads to X=a and Y=102.
The lookup pattern/2 predicate will be use to simplify the automatic
program transformation introduced later.
Further efficiency can be gained by compiling it out for each specific
pattern (i.e. replacing it with calls to retrieve term/2).
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
19. A workaround implemented in Prolog Examples
Examples
We consider two applying the workaround to two example programs:
Edit Distance implemented in Prolog
Hidden Markov Models in PRISM
For each of these we measure the impact (in time and space) of a
transformed version using the workaround.
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
20. A workaround implemented in Prolog Examples
Implementation of store term/2 and retrieve term/2
The programs have been transformed manually for these experiments
based on the pointer based representation previously introduced, but
simplified slightly for lists.
For store term/2 and retrieve term/2 we use the following simple
implementation:
store_term([],Index) :- assert(retrieve_term([],Index)).
store_term([X|Xs],Idx) :-
Idx1 is Idx + 1,
assert(retrieve_term(Idx,[X,Idx1])),
store_term(Xs,Idx1).
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
21. A workaround implemented in Prolog Examples
Edit Distance
Calculate the minimum number of edits (insertions, deletions and
replacements) to transform on list into another.
a minimal edit-distance algorithm written in Prolog which is
dependent on tabling for any non-trivial problem.
The theoretical best time complexity of edit distance has been proven
to be O(N 2 ).
Measure for various problem sizes.
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
22. A workaround implemented in Prolog Examples
Edit distance: Implementation in Prolog.
:- table edit/3. edit([X|Xs],[Y|Ys],Dist) :-
edit([X|Xs],Ys,InsDist),
edit([],[],0). edit(Xs,[Y|Ys],DelDist),
edit(Xs,Ys,TailDist),
edit([],[Y|Ys],Dist) :- (X==Y ->
edit([],Ys,Dist1), Dist = TailDist
Dist is 1 + Dist1. ;
% Minimum of insertion,
edit([X|Xs],[],Dist) :- % deletion or substitution
edit(Xs,[],Dist1), sort([InsDist,DelDist,TailDist],
Dist is 1 + Dist1. [MinDist|_]),
Dist is 1 + MinDist).
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
23. A workaround implemented in Prolog Examples
Edit distance, transformed (1).
original version transformed version
edit(XIdx,YIdx,0) :-
edit([],[],0). retrieve_term(XIdx,[]),
retrieve_term(YIdx,[]).
edit(XIdx,YIdx,Dist) :-
edit([],[Y|Ys],Dist) :- retrieve_term(XIdx,[]),
retrieve_term(YIdx,[_,YIdxNext]),
edit([],Ys,Dist1), edit(XIdx,YIdxNext,Dist1),
Dist is 1 + Dist1. Dist is Dist1 + 1.
edit(XIdx,YIdx,Dist) :-
edit([X|Xs],[],Dist) :- retrieve_term(YIdx,[]),
retrieve_term(XIdx,[_,XIdxNext]),
edit(Xs,[],Dist1), edit(XIdxNext,YIdx,Dist1),
Dist is Dist1 + 1.
Dist is 1 + Dist1.
continued on next slide...
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
24. A workaround implemented in Prolog Examples
Edit distance, transformed (2).
original version transformed version
edit([X|Xs],[Y|Ys],Dist) :- edit(XIdx,YIdx,Dist) :-
retrieve_term(XIdx,[X,NextXIdx]),
retrieve_term(YIdx,[Y,NextYIdx]),
edit([X|Xs],Ys,InsDist), edit(XIdx,NextYIdx,InsDist),
edit(Xs,[Y|Ys],DelDist), edit(NextXIdx,YIdx,DelDist),
edit(Xs,Ys,TailDist), edit(NextXIdx,NextYIdx,TailDist),
(X==Y -> (X==Y ->
Dist = TailDist Dist = TailDist
; ;
sort([InsDist,DelDist,TailDist], sort([InsDist,DelDist,TailDist],
[MinDist|_]), [MinDist|_]),
Dist is 1 + MinDist). Dist is 1 + MinDist).
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
27. A workaround implemented in Prolog Examples
PRISM
PRISM ( PRogramming In Statistical Modelling )
Extends Prolog (B-Prolog) with special goals representing random
variables (msws).
Semantics: Probabilistic Herbrand models.
Supports various probabilistic inferences.
Relies heavily on tabling for the efficiency of the probabilistic
inferences.
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
28. A workaround implemented in Prolog Examples
A Hidden Markov Model in PRISM
values(init,[s0,s1]).
0.1
values(out(_),[a,b]).
values(tr(_),[s0,s1]). 0.8
s0
hmm(L):- 0.3: a
0.5 0.9 s1
msw(init,S), 0.7: b
0.2 0.6: a
hmm(S,L). init
0.4: b
0.5
hmm(_,[]).
hmm(S,[Ob|Y]) :-
msw(out(S),Ob),
msw(tr(S),Next),
hmm(Next,Y).
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
29. A workaround implemented in Prolog Examples
Transformed Hidden Markov Model
We only need to consider the recursive predicate, hmm/2.
original version transformed version
hmm(_,[]). hmm(S,ObsPtr):-
retrieve_term(ObsPtr,[]).
hmm(S,[Ob|Y]) :- hmm(S,ObsPtr) :-
msw(out(S),Ob), retrieve_term(ObsPtr,[Ob,Y]),
msw(tr(S),Next), msw(out(S),Ob),
hmm(Next,Y). msw(tr(S),Next),
hmm(Next,Y).
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
30. A workaround implemented in Prolog Examples
Hidden Markov Model: Benchmarking results (time)
b) Running time without indexed lookup a) Running time with indexed lookup
140
● ●
0.08
●●●
● ●
●
●
●
120
● ●
● ●
●●
●
Running time (seconds)
Running time (seconds)
●
100
0.06
● ●
●
● ● ●●
● ●
●
● ●
80
●
● ●
●
●
0.04
● ●●
●●
●
60
● ●
● ●
● ●●
●
●
●
40
● ●
● ●
0.02
● ●
●
● ●●
●
● ●
● ●
20
● ●
●
● ●
●
● ●●
●● ●
● ●●
●● ●
● ●
●●●
●●●●●●●●●●
0.00
0
●
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
sequence length sequence length
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
31. An automatic program transformation
An automatic program transformation
We introduce an automatic transformation from a tabled program to an
efficient version using our approach.
To support the transformation, the user must declare modes for which
predicate arguments that should be indexed.
table_index_mode(hmm(+))
table_index_mode(hmm(-,+))
Each clause whose head is covered by a table mode declaration is
transformed and all other clauses are left untouched.
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
32. An automatic program transformation
Transformation of HMM in PRISM
The transformation moves any term appearing in an indexed position in
the head of a clause into a call to the lookup pattern predicate, which is
added to the body. Variables in such terms are marked lazy when they do
not occur in any non-indexed argument inside the clause.
original program transformed program
hmm(_,[]). hmm(S,ObsPtr):-
lookup_pattern(ObsPtr,[]).
hmm(S,[Ob|Y]) :- hmm(S,ObsPtr) :-
msw(out(S),Ob), lookup_pattern(ObsPtr,[Ob | lazy(Y)]),
msw(tr(S),Next), msw(out(S),Ob),
hmm(Next,Y). msw(tr(S),Next),
hmm(Next,Y).
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
33. An automatic program transformation
Each clause whose head predicate is covered by a table mode de
Transformation algorithm
transformed using the procedure outlined in algorithm 1, and all o
are left untouched. The transformation moves any term appearing in
for each clause H:-B in original program do
if table index mode (M) matching H then
for each argument Hi ∈ H, Mi ∈M do
if Mi =’+’ then
Hi ← MarkLazy(Hi , B)
B ← (lookup pattern(Vi , Hi ), B)
Hi ← V i
end
end
where MarkLazy is defined as
MarkLazy(Hi ,B) :
P otentialLazy = variables in all goals G ∈ B
where G has table index mode declaration of Structured Data
Christian Theil Have and Henning Christiansen Efficient Tabling
34. B ← (lookup pattern(Vi , Hi ), B)
An automatic program transformation
Hi ← V i
end
Transformation algorithm: MarkLazy
end
where MarkLazy is defined as
MarkLazy(Hi ,B) :
P otentialLazy = variables in all goals G ∈ B
where G has table index mode declaration
N onLazy = variables in all goals G ∈ B
where G has no table index mode declaration
Lazy = P otentialLazy N onLazy
for each variable V ∈ Hi do
if V ∈ Lazy then
V ← lazy(V )
end
Algorithm 1: Program transformation.
position in the head of a clause into a call to theStructured Data pattern
Christian Theil Have and Henning Christiansen Efficient Tabling of
lookup
35. An automatic program transformation
Conclusions
All Prolog implementations handle tabling of structured data
inefficiently.
We presented a Prolog based program transformation that ensures
O(1) time and space complexity of tabled lookups.
The transformation is data invariant and works with all the existing
tabling systems.
The transformation makes it possible to scale to much larger problem
instances.
Some limitations and problems to be solved:
Only applies to ground input arguments.
Abstract pointers in head of clauses circumvents usual pattern based
indexing (constant time overhead).
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data
36. An automatic program transformation
The road ahead...
Our program transformation should be seen as workaround, until such
optimizations find their way into the tabling systems. We hope that
Prolog implementors will pick up on this and integrate such optimizations
directly in the tabling systems, so that the user does not need to transform
his program, and need not worry about the underlying tabled
representation and its implicit complexity.
Christian Theil Have and Henning Christiansen Efficient Tabling of Structured Data