The document discusses using least common ancestor queries and the range minimum problem to efficiently pull out subtrees from a dendrogram for gene ontology analysis. It proposes:
1) Preprocessing the tree in linear time so the least common ancestor of two nodes can be returned in constant time.
2) Using this to linearize the tree and reduce it to finding the minimum value in a range of an array, which can also be done in constant time by preprocessing the array.
3) A divide and conquer approach to preprocess the array in linear time and space.
3. GO Analysis on Dendrogram
• For each GO term
• Pull out the skeleton sub-
tree for this term
• Test for significance only
at the nodes of this skeleton
tree
• How does one pull out the
skeleton sub-tree in time
proportional to the size of
that subtree?
4. LCA Queries
• Preprocess the tree in linear
time, so..
• Given two nodes, the least
common ancestor can be
returned in O(1) time
5. LCA Queries on a Line Graph
• Rank nodes in order from top
to bottom
• Take the min of the two
ranks
6. Linearizing a Tree
• Label each node with its
distance from the root
• Euler Tour to linearize nodes
in an array (size 2n)
Find the node with the least label in
this range
7. The Range Minimum Problem
• Given an array of size 2n,
preprocess it in linear time
so…
• Given a range, the min item in
that range can be returned in
O(1) time
8. Divide and Conquer
• Split into blocks at various
levels of granularity
– For each block, compute all
prefix and suffix mins
– Total space/preprocessing time
O(nlog n)
9. Query Handling
• Given the query range
– Determine the granularity level
at which this range straddles
adjacent blocks
• First bit diff in O(1) time
– Look up the appropriate
prefix/suffix mins in each block
• Look up precomputed tables in O(1)
time
10. Reducing Preprocessing Time
• Consider blocks of size Δ=log n/3
• Two blocks are said to be equivalent
if all within-block queries return the
same min-index for the two blocks
• How many equivalence classes are +1-1+1-1-1-1-1+1-1+1
there:
– Recall Euler Tour, adjacent nos differ in
+/-1, so 2Δ = n1/3
– For each distinct class and each of the
possible log2n/9 queries precompute
answers and store
– This takes time O(n1/3 log2n) = O(n)
11. Overall Preprocessing
• Compute the O(n1/3 log2n) data
structure for blocks of size
=log n/3
– Within block queries can be
answered using this
• Create a new array of size
2n/(log n/3) = 6n/log n by
replacing each block by just its
minimum item
• Preprocess this array as
before, but now in O(n) time
and space because the size of
this array is just O(n/log n).