1. A multithreaded algorithm
for network alignment
v
w
David F. Gleich
r Overlap s
Computer Science
Purdue University
wtu with
u
t
A L B Arif Khan, Alex Pothen
Purdue University, Computer Science
Work supported by DOE CSCAPES Institute grant (DE- Mahantesh Halappanavar
FC02-08ER25864), NSF CAREER grant 1149756-CCF,
and the Center for Adaptive Super Computing Software Pacific Northwest National Labs
Multithreaded Architectures (CASS-MT) at PNNL. PNNL
is operated by Battelle Memorial Institute under contract
1
DE-AC06-76RL01830
2. Network alignment"
What is the best way of matching "
graph A to B using only edges in L?
w
v
Overlap s
r
wtu
t u
A L B
Find a 1-1 matching between vertices
with as many overlaps as possible.
2
3. Network alignment"
… is NP-hard"
… has no approximation algorithm
w
v
r Overlap s • Computer Vision
• Ontology matching
• Database matching
wtu • Bioinformatics
t u
A L B
objective = α matching + βoverlap
3
4. the Figure 2. The NetworkBLAST local network alignment algorithm. Given two input
s) or
odes
lem
Network alignment"
networks, a network alignment graph is constructed. Nodes in this graph correspond
to pairs of sequence-similar proteins, one from each species, and edges correspond to
conserved interactions. A search algorithm identifies highly similar subnetworks that
follow a prespecified interaction pattern. Adapted from Sharan and Ideker.30
n the
ent;
nied
ped
lem
net-
one
one
plest
ying
eins
ome
the
be-
d as
aph
ever,
ap- From Sharan and Ideker, Modeling cellular machinery through biological
rked network comparison. Nat. Biotechnol. 24, 4 (Apr. 2006), 427–433.
, we
Figure 3. Performance comparison of computational approaches.
4
mon-
5. Our contribution
Multi-threaded network alignment via a new, multi-threaded
approximation algorithm for max-weight bipartite matching
procedure with linear complexity
415 sec
High performance C++ implementations "
40-times faster (on 16 cores – Xeon E5-2670)"
(C++ ~ 3, complexity ~ 2, threading ~ 8)"
www.cs.purdue.edu/~dgleich/codes/netalignmc
10 sec.
... enabling interactive computation!
5
6. … the best methods in a recent survey …
Bayati, Gleich, et al. TKDE forthcoming
Belief Propagation" Klau’s Matching
Relaxation!
Use a probabilistic Iterative improve an
relaxation and iteratively upper-bound on the
find the probability that solution via a sub-
an edge is in the gradient method
matching, given the applied to the
probabilities of its Lagrangian"
neighboring edges
6
7. Each iteration involves
Let x[i] be the score for
Matrix-vector-ish computations each pair-wise match in L
with a sparse matrix, e.g. sparse
matrix vector products in a semi- for i=1 to ...
ring, dot-products, axpy, etc.
update x[i] to y[i]
Bipartite max-weight matching compute a
using a different weight vector at max-weight match
with y
each iteration
update y[i] to x[i]
" (using match in MR)
No “convergence” "
100-1000 iterations
7
8.
9. The methods
Each iteration involves! Belief Propagation!
!
Listing 2. A belief-propagation message passing procedure for network
alignment. See the text for a description of othermax and round heuristic.
D
1 y(0) = 0, z(0) = 0, d(0) = 0, S(k) = 0 t
Matrix-vector-ish computations ! 2
3
for k = 1 to niter
T
F = bound0, [ S + S(k) ] Step 1: compute F
O
s
with a sparse matrix, e.g. sparse 4 d = ↵w + Fe Step 2: compute d a
! 5 y(k) = d othermaxcol(z(k 1) ) Step 3: othermax i
matrix vector products in a semi- 6 z(k) = d othermaxrow(y(k 1) ) i
h
S(k) = diag(y(k) + z(k) d)S F Step 4: update S
!
7
ring, dot-products, axpy, etc.
8 (y(k) , z(k) , S(k) ) k
(y(k) , z(k) , S(k) )+
O
a
9 (1 k
)(y(k 1) , z(k 1) , S(k 1) ) Step 5: damping
e
10
11 !
round heuristic (y(k) ) Step 6: matching
round heuristic (z(k) ) Step 6: matching
I
12 end
Bipartite max-weight matching return y(k) or z(k) with the largest objective value
!
13 t
p
using a different weight vector at m
!
w
each iteration
interpretation, the weight vectors are usually called messages
as they communicate the “beliefs” of each “agent.” In this A
particular problem, the neighborhood of an agent represents
all of the other edges in graph L incident on the same vertex s
9
in graph A (1st vector), all edges in L incident on the same fi
vertex in graph B (2nd vector), or the edges in L that are “
10. The NEW methods
Each iteration involves! Belief Propagation!
el
!
Listing 2. A belief-propagation message passing procedure for network
alignment. See the text for a description of othermax and round heuristic.
D
l
Paral
(0) (0) (0) (k)
y = 0, z = 0, d = 0, S = 0
1 t
! F = bound
Matrix-vector-ish computations for k = 1 to n [ S + S ] Step 1: compute F
2
3
iter
0,
(k) T
O
s
with a sparse matrix, e.g. sparse d = ↵wd+ Fe Step 2: compute dStep 3: othermax
4 a
! y = d othermaxrow(y ))
= 5
(k)
othermaxcol(z (k 1) i
matrix vector products in a semi- z 6
(k)
(k)
(k 1) i
h
S = diag(y + z d)S F Step 4: update S
(k) (k)
! (y , z , S ) (y , z , S )+
7
ring, dot-products, axpy, etc.
8
(k) (k) (k) k (k) (k) (k) O
a
9 (1 k
)(y(k 1) , z(k 1) , S(k 1) ) Step 5: damping
e
10
11 !
round heuristic (y(k) ) Step 6: matching
round heuristic (z(k) )
Step 6" I
12 end approx matching
Approximate bipartite max- return y or z with the largest objective value
(k) (k)
!
13 t
p
weight matching is used here m
!
w
instead!
interpretation, the weight vectors are usually called messages
as they communicate the “beliefs” of each “agent.” In this A
particular problem, the neighborhood of an agent represents
10
all of the other edges in graph L incident on the same vertex s
in graph A (1st vector), all edges in L incident on the same fi
vertex in graph B (2nd vector), or the edges in L that are “
11. MR
Approximation doesn’t hurt the
between the Library of Congress
r
0.2 ApproxMR
pedia categories (lcsh-wiki). While BP
e hierarchical tree, they also have
belief propagation algorithm
ApproxBP
r types of relationships. Thus we 0
0 5 10 15 20
l graphs. The second problem is an expected degree of noise in L (p ⋅ n)
rary of Congress subject headings
French National Library: Rameau. 1
d weights in L are computed via a
heading strings (and via translated
of correct match
au). These problems are larger than 0.8
BP a
Fraction fraction correct
indis nd App
tingu roxB
NMENT WITH APPROXIMATE 0.6
isha P
ATCHING ble
are
ss the question: how does the be- 0.4
d the BP method change when we
matching procedure from Section V MR
0.2 ApproxMR
step in each algorithm? Note that
BP
ching in the first step of Klau’s
ApproxBP
ch) because the problems in each 0
we parallelize over perturb onealso 0 5 10 15 20
Randomly rows. Note expected degree of noise in L (p ⋅ n)
is much more integral to Klau’s B
power-law graph to get A, The amount of random-ness in L in
average expected degree
edure. Generate L by the true-we Fig. 2. Alignment with a power-law graph shows the large effect that
For the BP procedure,
ing problem to evaluate the quality approximate rounding can have on solutions from Klau’s method (MR). With
11
match + random edges
Klau’s method, the results of the that method, using exact rounding will yield the identity matching for all
problems (bottom figure), whereas using the approximation results in over a
12. The methods (in more detail)
Belief Propagation" Klau’s Matching
Relaxation!
for i=1 to ... for i=1 to ...
update x[i] to y[i] update x[i] to y[i]
compute a compute a
max-weight match max-weight match
with y with y
save y if it is the update y[i] to x[i]
best result so far
based on the match
The matching is incidental to the BP method,
but integral to Klau’s MR method
12
13. On real-world problems,
ApproxMR isn’t so different
400
375
Upper overlap upper bound
381 On a protein-
350
bound on protein align-
overlap
300 ment problem,
there is little
250
difference with
Overlap
200 exact vs.
Upper bound on approximate
150 matching
matching
max weight
100 BP
671.551
AppBP
50 AppMR
MR
0
0 100 200 300 400 500 600
13
Weight
14. Algorithmic analysis
v
w
Exact runtime
s
r
matrix + matching with "
matrix ≪ matching
u
O(|EL| + |S|) + O(|EL| N log N)
t
A L B
Our approx. runtime!
Algorithmic parameters
matrix + approx. matching!
|EL| number of edges in L
O(|EL| + |S|) + O(|EL|)
|S| number of potential overlaps
14
15. A local dominating edge
method for bipartite matching
j
i The method guarantees
r
s
• ½ approximation
• maximal matching
based on work by Preis
(1999), Manne and
wtu Bisseling (2008), and
t u
Halappanavar et al (2012)
A L B
A locally dominating edge is an edge
heavier than all neighboring edges.
For bipartite Work on smaller side only
15
16. A local dominating edge
method for bipartite matching
j
Queue all vertices
i
r
s Until queue is empty!
In Parallel over vertices!
Match to heavy edge
and if there’s a conflict,
wtu
u
check the winner, and
t
find an alternative for
A L B the loser
Add endpoint of non-
A locally dominating edge is an edge dominating edges to
heavier than all neighboring edges.
the queue
For bipartite Work on smaller side only
16
17. A local dominating edge
method for bipartite matching
j
i Customized first iteration
r
s
(with all vertices)
Use OpenMP locks to
update choices
wtu
t u
Use sync_and_fetch_add
A L B for queue updates.
A locally dominating edge is an edge
heavier than all neighboring edges.
For bipartite Work on smaller side only
17
18. Remaining multi-threading
procedures are straightforward
Standard OpenMP for matrix-computations"
use schedule=dynamic to handle skew
We can batch the matching procedures in the
BP method for additional parallelism
for i=1 to ...
update x[i] to y[i]
save y[i] in a buffer
when the buffer is full
compute max-weight match
for all in buffer and save
the best
18
19. TABLE II
ed F OR EACH PROBLEM IN OUR BIOINFORMATICS AND ONTOLOGY SETS , WE
to Real-world data sets
REPORT THE NUMBER OF VERTICES IN GRAPH A AND B, THE NUMBER OF
EDGES IN THE GRAPH L, AND THE NUMBER OF NONZEROS IN S.
ch
Problem |VA | |VB |
|EL | |S|
dmela-scere 9,459 5,696
34,582 6,860
=
homo-musm 3,247 9,695 15,810 12,180
e.
lcsh-wiki 297,266 205,948 4,971,629 1,785,310
ed lcsh-rameau 154,974 342,684
20,883,500 4,929,272
be
d;
st Algorithmic parameters
Our approx. runtime
order to match vertices. We experimented with an initialization
algorithm tailored for bipartite graphs by approx. matching
matrix + spawning threads
|EL| number of edges in L
only from one of the vertex sets VO(|E |V+ |S|) identify|)
A or B to + O(|E locally
|S| number of potential overlaps
L L
dominant edges. If the thread is responsible for matching a
-1 vertex in VA , then it has to check the adjacency sets of the
19
vertices in VB that are adjacent to it in order to determine if the
20. Performance evaluation
(2x4)-10 core Intel E7-8870, 2.4 GHz (80-cores)
16 GB memory/proc (128 GB)
Scaling study
Mem
Mem
Mem
Mem
1. Thread binding " CPU
CPU
CPU
CPU
scattered vs. compact
CPU
CPU
CPU
CPU
2. Memory binding "
Mem
Mem
Mem
Mem
interleaved vs. bind
20
21. Scaling
BP with no batching
lcsh-rameau, 400 iterations
25
scatter and interleave
20
Speedup
15
115 seconds for 40-thread
10
5
1450 seconds for 1-thread
0
0 20 40 60 80
Threads
21
22. Scaling
BP with no batching
25
compact and interleave
compact and membind
20 scatter and interleave
scatter and membind
Speedup
15
10
5
0
0 20 40 60 80
Threads
22
23. 25
compact and interleave
compact and membind
Scaling
20 scatter and interleave
scatter and membind
Speedup
15
25
compact and interleave 10
compact and membind
20 scatter and interleave
scatter and membind 5
BP with no batching
Speedup
15
0
0 20 40 60 80
10
25 Threads
compact and interleave
5
Klau’s MR method
20
compact and membind
scatter and interleave
0 scatter and membind
0 20 40 60 80
Speedup
15
Threads
In all cases, we get a 10
speedup of around 12-15
on 40-cores with scatter 5
threads and interleaved BP with batch=20
23
0
memory
0 20 40
Threads
60 80
24. Summary & Conclusions
• Tailored algorithm for approx. max-weight bipartite matching
• Algorithmic improvement in network alignment methods
• Multi-threaded C++ code for network alignment
415 seconds -> 10 seconds (40-times overall speedup)
For large problems, interactive network alignment is possible
Future work Memory control, improved methods
Work supported by DOE CSCAPES Institute grant (DE-
Code and data available! FC02-08ER25864), NSF CAREER grant 1149756-CCF,
www.cs.purdue.edu/~dgleich/ and the Center for Adaptive Super Computing Software
codes/netalignmc Multithreaded Architectures (CASS-MT) at PNNL. PNNL
is operated by Battelle Memorial Institute under contract
24
DE-AC06-76RL01830