Complex Networks and Data Mining on genetic databases
1. Who’s this guy?
Knowledge
Discovery in
Databases
through
Complex
Networks:
application to
phylodynamics
Luiz Max F.
de Carvalho
Scientific
Computing
Programme
(PROCC),
Fiocruz
Pan American
Center for
Foot-andMouth Disease
(PAHO/WHO)
WaFiS 2012
Knowledge
Discovery in
Databases
(KDD)
Complex
Knowledge Discovery in Databases through
Complex Networks: application to
phylodynamics
Luiz Max F. de Carvalho
Scientific Computing Programme (PROCC), Fiocruz
Pan American Center for Foot-and-Mouth Disease
(PAHO/WHO)
WaFiS 2012
September 28, 2012
2. Outline
Knowledge
Discovery in
Databases
through
Complex
Networks:
application to
phylodynamics
Luiz Max F.
de Carvalho
Scientific
Computing
Programme
(PROCC),
Fiocruz
Pan American
Center for
Foot-andMouth Disease
(PAHO/WHO)
1 Knowledge Discovery in Databases (KDD)
2 Complex Networks
3 Example 1: Chitin pathway phylogeny
4 Example 2: Foot-and-mouth disease virus in South America
WaFiS 2012
Knowledge
Discovery in
Databases
(KDD)
Complex
3. Knowledge Discovery in Databases (KDD)
Knowledge
Discovery in
Databases
through
Complex
Networks:
application to
phylodynamics
Luiz Max F.
de Carvalho
Scientific
Computing
Programme
(PROCC),
Fiocruz
Pan American
Center for
Foot-andMouth Disease
(PAHO/WHO)
WaFiS 2012
Knowledge
Discovery in
Databases
(KDD)
Complex
Lots of data
human brain very limited processing capacity
Information → Knowledge
Increasing number of molecular data (sequences, 3D
structures, antigenicity,. . . )
Is it possible to explore these databases to discover useful
stuff?
4. Well. . . Let’s see
Knowledge
Discovery in
Databases
through
Complex
Networks:
application to
phylodynamics
Luiz Max F.
de Carvalho
Scientific
Computing
Programme
(PROCC),
Fiocruz
Pan American
Center for
Foot-andMouth Disease
(PAHO/WHO)
WaFiS 2012
Knowledge
Discovery in
Databases
(KDD)
Complex
5. [We may use] Complex Networks
Knowledge
Discovery in
Databases
through
Complex
Networks:
application to
phylodynamics
Luiz Max F.
de Carvalho
Scientific
Computing
Programme
(PROCC),
Fiocruz
Pan American
Center for
Foot-andMouth Disease
(PAHO/WHO)
WaFiS 2012
Knowledge
Discovery in
Databases
(KDD)
Complex
Graphs → G = (V , E )
6. Yeah, but how?
Knowledge
Discovery in
Databases
through
Complex
Networks:
application to
phylodynamics
Luiz Max F.
de Carvalho
Scientific
Computing
Programme
(PROCC),
Fiocruz
Pan American
Center for
Foot-andMouth Disease
(PAHO/WHO)
WaFiS 2012
Knowledge
Discovery in
Databases
(KDD)
Complex
We can explore the ”dynamic signature” of these Complex
Networks, i.e., study and compare their structural properties.
Some useful formulas:
Clustering Coefficient < c >:
Degree distribution PK =
Diameter: max(d(i, j))
3×#triangles
#triples
∞
K =K
pK
7. Ok, Let’s work then
Knowledge
Discovery in
Databases
through
Complex
Networks:
application to
phylodynamics
Luiz Max F.
de Carvalho
Scientific
Computing
Programme
(PROCC),
Fiocruz
Pan American
Center for
Foot-andMouth Disease
(PAHO/WHO)
WaFiS 2012
Knowledge
Discovery in
Databases
(KDD)
Complex
1
Grab n sequences;
2
Create an n × n matrix using some kind of (normalized)
distance (say, S);
3
For each σ ∈ [0, 1] build M(σ) such that:
mij (σ) =
1 if Sij > σ,
0 if Sij < σ.
In a sense, we are transforming a single network in a family of
networks.
8. Analysis
Knowledge
Discovery in
Databases
through
Complex
Networks:
application to
phylodynamics
Luiz Max F.
de Carvalho
Scientific
Computing
Programme
(PROCC),
Fiocruz
Pan American
Center for
Foot-andMouth Disease
(PAHO/WHO)
WaFiS 2012
Knowledge
Discovery in
Databases
(KDD)
Complex
We shall explore the relationships between these networks:
First, define a higher-order neighborhood indicator function,
such that you binarize the adjacency matrix with regard the
ˆ
path length , obtaining a matrix M = D M( ). Then
=1
δ(α, β) =
1
N2
N
N
(
i=1 j=1
ˆ
mij (α) mij (β)
ˆ
−
)
D(α)
D(β)
Evaluating δ(σ, σ + ∆σ) can give some interesting insights.
(1)
9. Example 1: Chitin pathway phylogeny
Knowledge
Discovery in
Databases
through
Complex
Networks:
application to
phylodynamics
Luiz Max F.
de Carvalho
Scientific
Computing
Programme
(PROCC),
Fiocruz
Pan American
Center for
Foot-andMouth Disease
(PAHO/WHO)
Proteins related to the chitin metabolic pathway from
1605 complete genomes;
WaFiS 2012
BLAST distances (which are asymmetric);
Knowledge
Discovery in
Databases
(KDD)
Complex
Search for phylogenetic relationships
10. Example 1: Some results
Knowledge
Discovery in
Databases
through
Complex
Networks:
application to
phylodynamics
Luiz Max F.
de Carvalho
Scientific
Computing
Programme
(PROCC),
Fiocruz
Pan American
Center for
Foot-andMouth Disease
(PAHO/WHO)
WaFiS 2012
Knowledge
Discovery in
Databases
(KDD)
Complex
11. Example 1: Some more results
Knowledge
Discovery in
Databases
through
Complex
Networks:
application to
phylodynamics
Luiz Max F.
de Carvalho
Scientific
Computing
Programme
(PROCC),
Fiocruz
Pan American
Center for
Foot-andMouth Disease
(PAHO/WHO)
WaFiS 2012
Knowledge
Discovery in
Databases
(KDD)
Complex
12. Example 1: The expected Network(s)
Knowledge
Discovery in
Databases
through
Complex
Networks:
application to
phylodynamics
Luiz Max F.
de Carvalho
Scientific
Computing
Programme
(PROCC),
Fiocruz
Pan American
Center for
Foot-andMouth Disease
(PAHO/WHO)
WaFiS 2012
Knowledge
Discovery in
Databases
(KDD)
Complex
13. Example 2: Foot-and-mouth disease virus in South
America
Knowledge
Discovery in
Databases
through
Complex
Networks:
application to
phylodynamics
Luiz Max F.
de Carvalho
Scientific
Computing
Programme
(PROCC),
Fiocruz
Pan American
Center for
Foot-andMouth Disease
(PAHO/WHO)
S was built with phylogenetic (TN93) distances for NT
and JTT distances for AA;
Try to make sense of a somewhat big data set (167 seqs);
Extract some nice patterns;
WaFiS 2012
Knowledge
Discovery in
Databases
(KDD)
Complex
14. Indexes × σ
Knowledge
Discovery in
Databases
through
Complex
Networks:
application to
phylodynamics
Luiz Max F.
de Carvalho
Scientific
Computing
Programme
(PROCC),
Fiocruz
Pan American
Center for
Foot-andMouth Disease
(PAHO/WHO)
(a)
WaFiS 2012
Knowledge
Discovery in
Databases
(KDD)
Complex
(b)
15. A nice network
Knowledge
Discovery in
Databases
through
Complex
Networks:
application to
phylodynamics
Luiz Max F.
de Carvalho
Scientific
Computing
Programme
(PROCC),
Fiocruz
Pan American
Center for
Foot-andMouth Disease
(PAHO/WHO)
WaFiS 2012
Knowledge
Discovery in
Databases
(KDD)
Complex
16. Some more developments
Knowledge
Discovery in
Databases
through
Complex
Networks:
application to
phylodynamics
Luiz Max F.
de Carvalho
Scientific
Computing
Programme
(PROCC),
Fiocruz
Pan American
Center for
Foot-andMouth Disease
(PAHO/WHO)
WaFiS 2012
Knowledge
Discovery in
Databases
(KDD)
Complex
17. Related Work
Knowledge
Discovery in
Databases
through
Complex
Networks:
application to
phylodynamics
Luiz Max F.
de Carvalho
Scientific
Computing
Programme
(PROCC),
Fiocruz
Pan American
Center for
Foot-andMouth Disease
(PAHO/WHO)
WaFiS 2012
Knowledge
Discovery in
Databases
(KDD)
Complex
Identify transmission clusters (HIV, HCV) (Lewis et al,
2008,Plos Medicine)
Explore scale-free behavior in phylodynamics (Shiino,
2012, Frontiers in Microbiology)
18. Future Directions
Knowledge
Discovery in
Databases
through
Complex
Networks:
application to
phylodynamics
Luiz Max F.
de Carvalho
Scientific
Computing
Programme
(PROCC),
Fiocruz
Pan American
Center for
Foot-andMouth Disease
(PAHO/WHO)
WaFiS 2012
Knowledge
Discovery in
Databases
(KDD)
Complex
Explore the spatial aspect in the construction of S
ˆ
Maybe S = µ + S(G )α
Power law analysis
Implement assortativity
Suggestions. . .