Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
The power of graphs to analyze biological data
1. the power of graphs for analyzing biological datasets
Davy Suvee
Janssen Pharmaceutica
2. about me
who am i ...
➡ working as an it lead / software architect @ janssen pharmaceutica
• dealing with big scientific data sets
• hands-on expertise in big data and NoSQL technologies
➡ founder of datablend
• provide big data and NoSQL consultancy
Davy Suvee • share practical knowledge and big data use cases via blog
@DSUVEE
3. outline
➡ getting visual insights into big data sets
★ gene expression clustering (mongodb, Neo4j, Gephi)
★ Mutation prevalence (cassandra, Neo4j, Gephi)
➡ fluxgraph, a time machine for you graphs ...
4. insights in big data
➡ typical approach through warehousing
★ star schema with fact tables and dimension tables
5. insights in big data
➡ typical approach through warehousing
★ star schema with fact tables and dimension tables
6. insights in big data
★ real-time visualization
★ filtering
★ metrics
★ layouting
1, 2
★ modular
1. http://gephi.org/plugins/neo4j-graph-database-support/ 2. http://github.com/datablend/gephi-blueprints-plugin
7. gene expression clustering
➡ oncology data set:
★ 4.800 samples
★ 27.000 genes
➡ Question:
★ for a particular subset of samples,
which genes are co-expressed?
12. graphs and time ...
➡ reproducible graph state
➡ towards a time-aware graph ...
➡ fluxgraph: a blueprints-compatible graph on top of Datomic
➡ make FluxGraph fully time-aware
★ travel your graph through time
★ time-scoped iteration of vertices and edges
★ temporal graph comparison
14. travel through time
FluxGraph fg = new FluxGraph();
Davy
Vertex davy = fg.addVertex();
davy.setProperty(“name”,”Davy”);
15. travel through time
FluxGraph fg = new FluxGraph();
Davy
Vertex davy = fg.addVertex();
davy.setProperty(“name”,”Davy”);
Peter
Vertex peter = ...
16. travel through time
FluxGraph fg = new FluxGraph();
Davy
Vertex davy = fg.addVertex();
davy.setProperty(“name”,”Davy”);
Peter
Vertex peter = ...
Vertex michael = ...
Michael
17. travel through time
FluxGraph fg = new FluxGraph();
Davy
kn
ow
Vertex davy = fg.addVertex();
s
davy.setProperty(“name”,”Davy”);
Peter
Vertex peter = ...
Vertex michael = ...
Edge e1 = Michael
fg.addEdge(davy, peter,“knows”);
18. travel through time
Davy
Date checkpoint = new Date();
kn
ow
s
Peter
Michael
19. travel through time
Davy
Date checkpoint = new Date();
kn
ow
s
davy.setProperty(“name”,”David”); Peter
Michael
20. travel through time
David
Date checkpoint = new Date();
kn
ow
s
davy.setProperty(“name”,”David”); Peter
Michael
21. travel through time
David
Date checkpoint = new Date();
kn
ow
s
davy.setProperty(“name”,”David”); Peter
kn
Edge e2 =
ow
fg.addEdge(davy, michael,“knows”);
s
Michael
22. travel through time by default
time
kn
Davy ow David
Davy
s
kn
ow
checkpoint
s
current
Peter Peter
kn
ow
s
Michael Michael
23. travel through time
time
kn
Davy ow David
Davy
s
kn
ow
checkpoint
s
current
Peter Peter
kn
ow
s
Michael Michael
fg.setCheckpointTime(checkpoint);
24. time-scoped iteration
t1 t2 t3 tcurrrent
change change change
Davy Davy’ Davy’’ Davy’’’
➡ how to find the version of the vertex you are interested in?
25. time-scoped iteration
t1 t2 t3 tcurrrent
next next next
Davy Davy’ Davy’’ Davy’’’
previous previous previous
26. time-scoped iteration
t1 t2 t3 tcurrrent
next next next
Davy Davy’ Davy’’ Davy’’’
previous previous previous
Vertex previousDavy = davy.getPreviousVersion();
27. time-scoped iteration
t1 t2 t3 tcurrrent
next next next
Davy Davy’ Davy’’ Davy’’’
previous previous previous
Vertex previousDavy = davy.getPreviousVersion();
Iterable<Vertex> allDavy = davy.getNextVersions();
28. time-scoped iteration
t1 t2 t3 tcurrrent
next next next
Davy Davy’ Davy’’ Davy’’’
previous previous previous
Vertex previousDavy = davy.getPreviousVersion();
Iterable<Vertex> allDavy = davy.getNextVersions();
Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);
29. time-scoped iteration
t1 t2 t3 tcurrrent
next next next
Davy Davy’ Davy’’ Davy’’’
previous previous previous
Vertex previousDavy = davy.getPreviousVersion();
Iterable<Vertex> allDavy = davy.getNextVersions();
Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);
Interval valid = davy.getTimerInterval();
30. time-scoped iteration
➡ When does an element change?
➡ vertex:
★ setting or removing a property
★ add or remove it from an edge
★ being removed
31. time-scoped iteration
➡ When does an element change?
➡ vertex: ➡ edge:
★ setting or removing a property ★ setting or removing a property
★ add or remove it from an edge ★ being removed
★ being removed
32. time-scoped iteration
➡ When does an element change?
➡ vertex: ➡ edge:
★ setting or removing a property ★ setting or removing a property
★ add or remove it from an edge ★ being removed
★ being removed
➡ ... and each element is time-scoped!
35. temporal graph comparison
➡ difference (A , B) = union (A , B) - B
➡ ... as a (immutable) graph! David
difference ( , )=
kn
ow
s
36. use case: longitudinal patient data
t1 t2 t3 t4 t5
smoking smoking death
patient patient patient patient patient
cancer cancer
37. use case: longitudinal patient data
➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)
38. use case: longitudinal patient data
➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)
➡ example analysis:
★ if a male patient is no longer smoking in 2005
★ what are the chances of getting lung cancer in 2010, comparing
patients that smoked before 2005
patients that never smoked
39. use case: longitudinal patient data
➡ get all male non-smokers in 2005
fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
40. use case: longitudinal patient data
➡ get all male non-smokers in 2005
fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
Iterator<Vertex> males =
fg.getVertices("gender", "male").iterator()
41. use case: longitudinal patient data
➡ get all male non-smokers in 2005
fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
Iterator<Vertex> males =
fg.getVertices("gender", "male").iterator()
while (males.hasNext()) {
Vertex p2005 = males.next();
boolean smoking2005 =
p2005.getEdges(OUT,"smokingStatus").iterator().hasNext();
}
42. use case: longitudinal patient data
➡ which patients were smoking before 2005?
boolean smokingBefore2005 =
((FluxVertex)p2005).getPreviousVersions(new TimeAwareFilter() {
public TimeAwareElement filter(TimeAwareVertex element) {
return element.getEdges(OUT, "smokingStatus").iterator().hasNext()
? element : null;
}
}).iterator().hasNext();
43. use case: longitudinal patient data
➡ which patients have cancer in 2010
working set of smokers
Graph g =
fg.difference(smokerws,
time2010.toDate(),
time2005.toDate());
44. use case: longitudinal patient data
➡ which patients have cancer in 2010
working set of smokers
Graph g =
fg.difference(smokerws,
time2010.toDate(),
time2005.toDate());
➡ extract the patients that have an edge to the cancer node