SlideShare uma empresa Scribd logo
1 de 45
Baixar para ler offline
the power of graphs for analyzing biological datasets

                       Davy Suvee

                    Janssen Pharmaceutica
about me

                 who am i ...
                 ➡ working as an it lead / software architect @ janssen pharmaceutica
                   • dealing with big scientific data sets
                   • hands-on expertise in big data and NoSQL technologies



                 ➡ founder of datablend
                   • provide big data and NoSQL consultancy
    Davy Suvee     • share practical knowledge and big data use cases via blog

      @DSUVEE
outline


➡ getting visual insights into big data sets
  ★ gene expression clustering (mongodb, Neo4j, Gephi)
  ★ Mutation prevalence (cassandra, Neo4j, Gephi)



➡ fluxgraph, a time machine for you graphs ...
insights in big data
➡ typical approach through warehousing
  ★ star schema with fact tables and dimension tables
insights in big data
➡ typical approach through warehousing
  ★ star schema with fact tables and dimension tables
insights in big data


                                                                                                                     ★ real-time visualization
                                                                                                                     ★ filtering
                                                                                                                     ★ metrics
                                                                                                                     ★ layouting
                                                                                                                                1, 2
                                                                                                                     ★ modular




1. http://gephi.org/plugins/neo4j-graph-database-support/   2. http://github.com/datablend/gephi-blueprints-plugin
gene expression clustering

                        ➡ oncology data set:
                          ★ 4.800 samples
                          ★ 27.000 genes


                        ➡ Question:
                          ★ for a particular subset of samples,
                          which genes are co-expressed?
mongodb for storing gene expressions
{ "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} ,
  "sample_name" : "122551hp133a21.cel" ,
  "genomics_id" : 122551 ,
  "sample_id" : 343981 ,
  "donor_id" : 143981 ,
  "sample_type" : "Tissue" ,
  "sample_site" : "Ascending colon" ,
  "pathology_category" : "MALIGNANT" ,
  "pathology_morphology" : "Adenocarcinoma" ,
  "pathology_type" : "Primary malignant neoplasm of colon" ,
  "primary_site" : "Colon" ,
  "expressions" : [ { "gene" : "X1_at" , "expression" : 5.54217719084415} ,
                    { "gene" : "X10_at" , "expression" : 3.92335121981739} ,
                    { "gene" : "X100_at" , "expression" : 7.81638155662255} ,
                    { "gene" : "X1000_at" , "expression" : 5.44318512260619} ,
                     … ]
}
pearson correlation through map-reduce
                         x   y

pearson correlation     43   99

                        21   65

                        25   79        0,52
                        42   75

                        57   87

                        59   81
co-expression graph


➡ create a node for each gene
➡ if correlation between two genes >= 0.8, draw an edge between both nodes
co-expression graph
graphs and time ...
➡ reproducible graph state

➡ towards a time-aware graph ...

➡ fluxgraph: a blueprints-compatible graph on top of Datomic

➡ make FluxGraph fully time-aware
   ★ travel your graph through time
   ★ time-scoped iteration of vertices and edges
   ★ temporal graph comparison
travel through time
FluxGraph fg = new FluxGraph();
travel through time
FluxGraph fg = new FluxGraph();
                                   Davy

Vertex davy = fg.addVertex();
davy.setProperty(“name”,”Davy”);
travel through time
FluxGraph fg = new FluxGraph();
                                   Davy

Vertex davy = fg.addVertex();
davy.setProperty(“name”,”Davy”);
                                          Peter
Vertex peter = ...
travel through time
FluxGraph fg = new FluxGraph();
                                   Davy

Vertex davy = fg.addVertex();
davy.setProperty(“name”,”Davy”);
                                                    Peter
Vertex peter = ...
Vertex michael = ...

                                          Michael
travel through time
FluxGraph fg = new FluxGraph();
                                     Davy




                                                      kn
                                                       ow
Vertex davy = fg.addVertex();




                                                           s
davy.setProperty(“name”,”Davy”);
                                                       Peter
Vertex peter = ...
Vertex michael = ...

Edge e1 =                                   Michael
  fg.addEdge(davy, peter,“knows”);
travel through time

                                Davy
Date checkpoint = new Date();




                                                 kn
                                                  ow
                                                      s
                                                  Peter




                                       Michael
travel through time

                                    Davy
Date checkpoint = new Date();




                                                     kn
                                                      ow
                                                          s
davy.setProperty(“name”,”David”);                     Peter




                                           Michael
travel through time

                                    David
Date checkpoint = new Date();




                                                      kn
                                                       ow
                                                           s
davy.setProperty(“name”,”David”);                      Peter




                                            Michael
travel through time

                                       David
Date checkpoint = new Date();




                                                         kn
                                                          ow
                                                              s
davy.setProperty(“name”,”David”);                         Peter




                                       kn
Edge e2 =




                                        ow
  fg.addEdge(davy, michael,“knows”);




                                            s
                                               Michael
travel through time                                           by default
time


                        kn
       Davy                  ow                            David
                                                           Davy
                                  s




                                                                             kn
                                                                              ow
                                              checkpoint




                                                                                  s



                                                                                          current
                                      Peter                                   Peter




                                                           kn
                                                            ow
                                                                s
              Michael                                              Michael
travel through time
time


                         kn
       Davy                   ow                            David
                                                            Davy
                                   s




                                                                              kn
                                                                               ow
                                               checkpoint




                                                                                   s



                                                                                       current
                                       Peter                                   Peter




                                                            kn
                                                             ow
                                                                 s
              Michael                                               Michael




                        fg.setCheckpointTime(checkpoint);
time-scoped iteration

         t1               t2               t3                 tcurrrent


              change           change            change



      Davy             Davy’            Davy’’            Davy’’’




  ➡ how to find the version of the vertex you are interested in?
time-scoped iteration
      t1                 t2                 t3                   tcurrrent




             next              next                next

    Davy              Davy’              Davy’’              Davy’’’
           previous           previous            previous
time-scoped iteration
       t1                 t2                 t3                   tcurrrent




              next              next                next

     Davy              Davy’              Davy’’              Davy’’’
            previous           previous            previous




Vertex previousDavy = davy.getPreviousVersion();
time-scoped iteration
         t1                 t2                 t3                   tcurrrent




                next              next                next

       Davy              Davy’              Davy’’              Davy’’’
              previous           previous            previous




 Vertex previousDavy = davy.getPreviousVersion();
Iterable<Vertex> allDavy = davy.getNextVersions();
time-scoped iteration
            t1                 t2                 t3                   tcurrrent




                   next              next                next

          Davy              Davy’              Davy’’              Davy’’’
                 previous           previous            previous




     Vertex previousDavy = davy.getPreviousVersion();
   Iterable<Vertex> allDavy = davy.getNextVersions();
Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);
time-scoped iteration
            t1                 t2                 t3                   tcurrrent




                   next              next                next

          Davy              Davy’              Davy’’              Davy’’’
                 previous           previous            previous




     Vertex previousDavy = davy.getPreviousVersion();
   Iterable<Vertex> allDavy = davy.getNextVersions();
Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);
       Interval valid = davy.getTimerInterval();
time-scoped iteration
➡ When does an element change?


➡ vertex:
   ★ setting or removing a property
   ★ add or remove it from an edge
   ★ being removed
time-scoped iteration
➡ When does an element change?


➡ vertex:                             ➡ edge:
   ★ setting or removing a property      ★ setting or removing a property
   ★ add or remove it from an edge       ★ being removed
   ★ being removed
time-scoped iteration
➡ When does an element change?


➡ vertex:                                ➡ edge:
   ★ setting or removing a property         ★ setting or removing a property
   ★ add or remove it from an edge          ★ being removed
   ★ being removed



➡ ... and each element is time-scoped!
temporal graph comparison

David
Davy                                          Davy




                                                                kn
                     kn




                                                                     ow
                      ow




                                                                      s
                          s
                      Peter   what changed?                          Peter
kn
 ow
     s




        Michael                                      Michael


           current                                      checkpoint
temporal graph comparison
➡ difference (A , B) = union (A , B) - B
➡ ... as a (immutable) graph!
temporal graph comparison
➡ difference (A , B) = union (A , B) - B
➡ ... as a (immutable) graph!                   David




  difference (                  ,          )=




                                                kn
                                                 ow
                                                     s
use case: longitudinal patient data
    t1        t2        t3        t4        t5




          smoking   smoking             death




patient   patient   patient   patient   patient




                              cancer    cancer
use case: longitudinal patient data

➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)
use case: longitudinal patient data

➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)


➡ example analysis:
   ★ if a male patient is no longer smoking in 2005
   ★ what are the chances of getting lung cancer in 2010, comparing
        patients that smoked before 2005
        patients that never smoked
use case: longitudinal patient data
➡ get all male non-smokers in 2005

fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
use case: longitudinal patient data
➡ get all male non-smokers in 2005

fg.setCheckpointTime(new DateTime(2005,12,31).toDate());

Iterator<Vertex> males =
  fg.getVertices("gender", "male").iterator()
use case: longitudinal patient data
➡ get all male non-smokers in 2005

fg.setCheckpointTime(new DateTime(2005,12,31).toDate());

Iterator<Vertex> males =
  fg.getVertices("gender", "male").iterator()

while (males.hasNext()) {
   Vertex p2005 = males.next();
   boolean smoking2005 =
     p2005.getEdges(OUT,"smokingStatus").iterator().hasNext();
}
use case: longitudinal patient data
➡ which patients were smoking before 2005?


boolean smokingBefore2005 =
  ((FluxVertex)p2005).getPreviousVersions(new TimeAwareFilter() {

    public TimeAwareElement filter(TimeAwareVertex element) {
      return element.getEdges(OUT, "smokingStatus").iterator().hasNext()
        ? element : null;
    }

  }).iterator().hasNext();
use case: longitudinal patient data
➡ which patients have cancer in 2010

                                       working set of smokers
 Graph g =
   fg.difference(smokerws,
                 time2010.toDate(),
                 time2005.toDate());
use case: longitudinal patient data
➡ which patients have cancer in 2010

                                       working set of smokers
 Graph g =
   fg.difference(smokerws,
                 time2010.toDate(),
                 time2005.toDate());



➡ extract the patients that have an edge to the cancer node
Questions?

Mais conteúdo relacionado

Destaque

Validation of Identity and Ancestry SNP Panels for the Ion PGM™ System
Validation of Identity and Ancestry SNP Panels for the Ion PGM™ SystemValidation of Identity and Ancestry SNP Panels for the Ion PGM™ System
Validation of Identity and Ancestry SNP Panels for the Ion PGM™ SystemThermo Fisher Scientific
 
GraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas Weber
GraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas WeberGraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas Weber
GraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas WeberNeo4j
 
Mind mapping for project work
Mind mapping for project workMind mapping for project work
Mind mapping for project workMind Vector
 
DNA Evidence with Ancestry
DNA Evidence with AncestryDNA Evidence with Ancestry
DNA Evidence with Ancestrybos45
 
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...Neo4j
 
Neo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in GraphdatenbankenNeo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in GraphdatenbankenNeo4j
 
Getting The Most Out Of Mind Mapping
Getting The Most Out Of Mind MappingGetting The Most Out Of Mind Mapping
Getting The Most Out Of Mind MappingMichael Deutch
 
Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)Data Science Thailand
 

Destaque (10)

Validation of Identity and Ancestry SNP Panels for the Ion PGM™ System
Validation of Identity and Ancestry SNP Panels for the Ion PGM™ SystemValidation of Identity and Ancestry SNP Panels for the Ion PGM™ System
Validation of Identity and Ancestry SNP Panels for the Ion PGM™ System
 
GraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas Weber
GraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas WeberGraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas Weber
GraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas Weber
 
Ancestry Tutorial
Ancestry TutorialAncestry Tutorial
Ancestry Tutorial
 
Mind mapping for project work
Mind mapping for project workMind mapping for project work
Mind mapping for project work
 
DNA Evidence with Ancestry
DNA Evidence with AncestryDNA Evidence with Ancestry
DNA Evidence with Ancestry
 
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
 
Neo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in GraphdatenbankenNeo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in Graphdatenbanken
 
SNP Genotyping Technologies
SNP Genotyping TechnologiesSNP Genotyping Technologies
SNP Genotyping Technologies
 
Getting The Most Out Of Mind Mapping
Getting The Most Out Of Mind MappingGetting The Most Out Of Mind Mapping
Getting The Most Out Of Mind Mapping
 
Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

The power of graphs to analyze biological data

  • 1. the power of graphs for analyzing biological datasets Davy Suvee Janssen Pharmaceutica
  • 2. about me who am i ... ➡ working as an it lead / software architect @ janssen pharmaceutica • dealing with big scientific data sets • hands-on expertise in big data and NoSQL technologies ➡ founder of datablend • provide big data and NoSQL consultancy Davy Suvee • share practical knowledge and big data use cases via blog @DSUVEE
  • 3. outline ➡ getting visual insights into big data sets ★ gene expression clustering (mongodb, Neo4j, Gephi) ★ Mutation prevalence (cassandra, Neo4j, Gephi) ➡ fluxgraph, a time machine for you graphs ...
  • 4. insights in big data ➡ typical approach through warehousing ★ star schema with fact tables and dimension tables
  • 5. insights in big data ➡ typical approach through warehousing ★ star schema with fact tables and dimension tables
  • 6. insights in big data ★ real-time visualization ★ filtering ★ metrics ★ layouting 1, 2 ★ modular 1. http://gephi.org/plugins/neo4j-graph-database-support/ 2. http://github.com/datablend/gephi-blueprints-plugin
  • 7. gene expression clustering ➡ oncology data set: ★ 4.800 samples ★ 27.000 genes ➡ Question: ★ for a particular subset of samples, which genes are co-expressed?
  • 8. mongodb for storing gene expressions { "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} ,   "sample_name" : "122551hp133a21.cel" ,   "genomics_id" : 122551 ,   "sample_id" : 343981 ,   "donor_id" : 143981 ,   "sample_type" : "Tissue" ,   "sample_site" : "Ascending colon" ,   "pathology_category" : "MALIGNANT" ,   "pathology_morphology" : "Adenocarcinoma" ,   "pathology_type" : "Primary malignant neoplasm of colon" ,   "primary_site" : "Colon" ,   "expressions" : [ { "gene" : "X1_at" , "expression" : 5.54217719084415} ,                     { "gene" : "X10_at" , "expression" : 3.92335121981739} ,                     { "gene" : "X100_at" , "expression" : 7.81638155662255} ,                     { "gene" : "X1000_at" , "expression" : 5.44318512260619} ,                      … ] }
  • 9. pearson correlation through map-reduce x y pearson correlation 43 99 21 65 25 79 0,52 42 75 57 87 59 81
  • 10. co-expression graph ➡ create a node for each gene ➡ if correlation between two genes >= 0.8, draw an edge between both nodes
  • 12. graphs and time ... ➡ reproducible graph state ➡ towards a time-aware graph ... ➡ fluxgraph: a blueprints-compatible graph on top of Datomic ➡ make FluxGraph fully time-aware ★ travel your graph through time ★ time-scoped iteration of vertices and edges ★ temporal graph comparison
  • 13. travel through time FluxGraph fg = new FluxGraph();
  • 14. travel through time FluxGraph fg = new FluxGraph(); Davy Vertex davy = fg.addVertex(); davy.setProperty(“name”,”Davy”);
  • 15. travel through time FluxGraph fg = new FluxGraph(); Davy Vertex davy = fg.addVertex(); davy.setProperty(“name”,”Davy”); Peter Vertex peter = ...
  • 16. travel through time FluxGraph fg = new FluxGraph(); Davy Vertex davy = fg.addVertex(); davy.setProperty(“name”,”Davy”); Peter Vertex peter = ... Vertex michael = ... Michael
  • 17. travel through time FluxGraph fg = new FluxGraph(); Davy kn ow Vertex davy = fg.addVertex(); s davy.setProperty(“name”,”Davy”); Peter Vertex peter = ... Vertex michael = ... Edge e1 = Michael fg.addEdge(davy, peter,“knows”);
  • 18. travel through time Davy Date checkpoint = new Date(); kn ow s Peter Michael
  • 19. travel through time Davy Date checkpoint = new Date(); kn ow s davy.setProperty(“name”,”David”); Peter Michael
  • 20. travel through time David Date checkpoint = new Date(); kn ow s davy.setProperty(“name”,”David”); Peter Michael
  • 21. travel through time David Date checkpoint = new Date(); kn ow s davy.setProperty(“name”,”David”); Peter kn Edge e2 = ow fg.addEdge(davy, michael,“knows”); s Michael
  • 22. travel through time by default time kn Davy ow David Davy s kn ow checkpoint s current Peter Peter kn ow s Michael Michael
  • 23. travel through time time kn Davy ow David Davy s kn ow checkpoint s current Peter Peter kn ow s Michael Michael fg.setCheckpointTime(checkpoint);
  • 24. time-scoped iteration t1 t2 t3 tcurrrent change change change Davy Davy’ Davy’’ Davy’’’ ➡ how to find the version of the vertex you are interested in?
  • 25. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous
  • 26. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous Vertex previousDavy = davy.getPreviousVersion();
  • 27. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous Vertex previousDavy = davy.getPreviousVersion(); Iterable<Vertex> allDavy = davy.getNextVersions();
  • 28. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous Vertex previousDavy = davy.getPreviousVersion(); Iterable<Vertex> allDavy = davy.getNextVersions(); Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);
  • 29. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous Vertex previousDavy = davy.getPreviousVersion(); Iterable<Vertex> allDavy = davy.getNextVersions(); Iterable<Vertex> selDavy = davy.getPreviousVersions(filter); Interval valid = davy.getTimerInterval();
  • 30. time-scoped iteration ➡ When does an element change? ➡ vertex: ★ setting or removing a property ★ add or remove it from an edge ★ being removed
  • 31. time-scoped iteration ➡ When does an element change? ➡ vertex: ➡ edge: ★ setting or removing a property ★ setting or removing a property ★ add or remove it from an edge ★ being removed ★ being removed
  • 32. time-scoped iteration ➡ When does an element change? ➡ vertex: ➡ edge: ★ setting or removing a property ★ setting or removing a property ★ add or remove it from an edge ★ being removed ★ being removed ➡ ... and each element is time-scoped!
  • 33. temporal graph comparison David Davy Davy kn kn ow ow s s Peter what changed? Peter kn ow s Michael Michael current checkpoint
  • 34. temporal graph comparison ➡ difference (A , B) = union (A , B) - B ➡ ... as a (immutable) graph!
  • 35. temporal graph comparison ➡ difference (A , B) = union (A , B) - B ➡ ... as a (immutable) graph! David difference ( , )= kn ow s
  • 36. use case: longitudinal patient data t1 t2 t3 t4 t5 smoking smoking death patient patient patient patient patient cancer cancer
  • 37. use case: longitudinal patient data ➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)
  • 38. use case: longitudinal patient data ➡ historical data for 15.000 patients over a period of 10 years (2001- 2010) ➡ example analysis: ★ if a male patient is no longer smoking in 2005 ★ what are the chances of getting lung cancer in 2010, comparing patients that smoked before 2005 patients that never smoked
  • 39. use case: longitudinal patient data ➡ get all male non-smokers in 2005 fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
  • 40. use case: longitudinal patient data ➡ get all male non-smokers in 2005 fg.setCheckpointTime(new DateTime(2005,12,31).toDate()); Iterator<Vertex> males = fg.getVertices("gender", "male").iterator()
  • 41. use case: longitudinal patient data ➡ get all male non-smokers in 2005 fg.setCheckpointTime(new DateTime(2005,12,31).toDate()); Iterator<Vertex> males = fg.getVertices("gender", "male").iterator() while (males.hasNext()) { Vertex p2005 = males.next(); boolean smoking2005 = p2005.getEdges(OUT,"smokingStatus").iterator().hasNext(); }
  • 42. use case: longitudinal patient data ➡ which patients were smoking before 2005? boolean smokingBefore2005 = ((FluxVertex)p2005).getPreviousVersions(new TimeAwareFilter() { public TimeAwareElement filter(TimeAwareVertex element) { return element.getEdges(OUT, "smokingStatus").iterator().hasNext() ? element : null; } }).iterator().hasNext();
  • 43. use case: longitudinal patient data ➡ which patients have cancer in 2010 working set of smokers Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate());
  • 44. use case: longitudinal patient data ➡ which patients have cancer in 2010 working set of smokers Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate()); ➡ extract the patients that have an edge to the cancer node