SlideShare uma empresa Scribd logo
1 de 13
Baixar para ler offline
HIG Project Overview

           August 31, 2012




    Matthieu-P. Schapranow
    Hasso Plattner Institute
Chair of Prof. Hasso Plattner
Vision: Real-time Analysis of Genomic
    Data to Improve Medical Treatment
2




    HIG Project Overview, M. Schapranow, Aug 31, 2012
Build up the Whole Picture out of Layers

3     ■  Data:
           □  Combine research findings from int’l scientific databases in
              single system at HPI
      ■  Platform:
           □  Expose information as a service to be consumed by special
              purpose applications
      ■  Applications:
           □  Support genome alignment pipeline processing by
           □  Massively parallel execute:
                □ Alignment algorithms, e.g. BWA, BT2, etc.
                □ Variant calling
           □  Analyze individual patient results (real-time annotations with
              combined data)
           □  Analyze patient cohorts using individual filters
    HIG Project Overview, M. Schapranow, Aug 31, 2012
How the Vision Becomes Real
4


      ■  Platform:
           □  Worker Framework: Enables parallel execution of tasks
              (alignment, variant calling) across node limits
           □  Updating Framework: Retrieves periodic database updated of
              international databases and automatically integrates them into
              local store
      ■  Applications:
           □  Alignment Coordinator: Submit alignment tasks and retrieve
              mutation lists, e.g. CSV
           □  Genome Browser: Interactive browsing in reference and
              specific patient genomes



    HIG Project Overview, M. Schapranow, Aug 31, 2012
Alignment Coordinator
5


      ■  Available Alignment Algorithms (and growing)
           □  Bowtie2
           □  Bowtie
           □  BWA
           □  TMAP
           □  SNAP
           □  MAQ
           □  SOAP




    HIG Project Overview, M. Schapranow, Aug 31, 2012
Numbers you should know
    Alignment Execution Time
6


      ■  One cell line ~600k reads / 110MB
      ■  Pipeline: Alignment and variant calling

             Property               Traditional             HPI
           Full Genome                    No                Yes
                Cores               2 * 6 cores         25 * 40 cores
           Main Memory                  48 GB              25 TB
              Runtime                   ~720                ~40s




    HIG Project Overview, M. Schapranow, Aug 31, 2012
Numbers you should know
    History of the Human Genome Project
7


      ■  1984: Idea of a global Human Genome
         (HG) project discussed at Alta Summit:
         “DNA available on the Internet”
      ■  1990: HG project for 15 years started in
         the US (3 billion USD funding)
      ■  2000: Rough draft of the HG announced
      ■  2003: Complete genome sequenced
      ■  2006: Last and longest chr1 sequenced


      ■  … what’s next?




    HIG Project Overview, M. Schapranow, Aug 31, 2012
Numbers you should know
    Human Genome
8


              Entity                Cardinality
      Different Bases                 4 (A,C,G,T)
      Base Pairs                        3.137 Bbp
      Chromosomes                                  23
      Distinct Genes                       20k-25k
      Amino Acids                                  21
      (coded as triplets)
      Proteins                           50k-300k




      Taken from http://de.wikipedia.org/wiki/Code-Sonne

    HIG Project Overview, M. Schapranow, Aug 31, 2012
9
                                                                                Costs in USD




                                                               0,01
                                                                      0,1
                                                                            1
                                                                                    10
                                                                                               100
                                                                                                     1000
                                                                                                            10000
                                                    01.01.01
                                                    01.05.01
                                                    01.09.01
                                                    01.01.02
                                                    01.05.02
                                                    01.09.02
                                                    01.01.03
                                                    01.05.03
                                                    01.09.03
                                                    01.01.04
                                                    01.05.04
                                                                                                                                                                                                              Comparison of Costs




                                                    01.09.04
                                                    01.01.05
                                                                                                                    Costs per Megabyte RAM




                                                    01.05.05
                                                    01.09.05
                                                                                                                                                                                                              Numbers you should know




HIG Project Overview, M. Schapranow, Aug 31, 2012
                                                    01.01.06
                                                    01.05.06
                                                    01.09.06
                                                    01.01.07
                                                    01.05.07
                                                    01.09.07
                                                    01.01.08
                                                    01.05.08
                                                    01.09.08
                                                    01.01.09
                                                                                                                    Costs per Megabase Sequencing




                                                    01.05.09
                                                    01.09.09
                                                    01.01.10
                                                                                                                                                    Comparison of Costs for Main Memory and Genome Analysis




                                                    01.05.10
                                                    01.09.10
                                                    01.01.11
                                                    01.05.11
                                                    01.09.11
                                                    01.01.12
Hardware Characteristics
10


       ■  1,000 core cluster,
          25 TB main memory
       ■  Consists of 25 identical nodes:
            □  80 cores
            □  1 TB main memory
            □  Intel® Xeon® E7- 4870
            □  2.40GHz
            □  30 MB Cache




     HIG Project Overview, M. Schapranow, Aug 31, 2012
Customer Process as of Today
11


       ■  Tissue sequencing in context of cancer treatment
       ■  Complex, time-consuming, media breaks, manual steps




     HIG Project Overview, M. Schapranow, Aug 31, 2012
Project Objectives
12


       ■  Alignment of DNA reads (FASTQ) against reference genome
          (FASTA) è mapped reads
       ■  Real-time analysis of mapped reads
            □  Detection of mutations (SNP, INDELs)
            □  Comparison of multiple tissues
            □  Detection of similar clusters to identify co-relations
       ■  Analysis of mutations
            □  Identify mutations with scientific references (existing
               knowledge)
            □  Detection of similar clusters to identify co-relations
            □  Identify genes and regulators for certain phenotypic
               characteristics, e.g. “fast running horses”
     HIG Project Overview, M. Schapranow, Aug 31, 2012
Thank you for your interest!
     Keep in contact with us.
13




                                                                 Matthieu-P. Schapranow, M.Sc.
                                                               schapranow@hpi.uni-potsdam.de
                                                                        http://j.mp/schapranow




                                                                     Hasso Plattner Institute
                                                 Enterprise Platform & Integration Concepts
                                                                     Matthieu-P. Schapranow
                                                                       August-Bebel-Str. 88
                                                                   14482 Potsdam, Germany

     HIG Project Overview, M. Schapranow, Aug 31, 2012

Mais conteúdo relacionado

Mais de Matthieu Schapranow

Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Matthieu Schapranow
 
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...Matthieu Schapranow
 
In-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineIn-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineMatthieu Schapranow
 
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureMatthieu Schapranow
 
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Matthieu Schapranow
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineMatthieu Schapranow
 
In-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineIn-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineMatthieu Schapranow
 
Analyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineAnalyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineMatthieu Schapranow
 
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchAnalyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchMatthieu Schapranow
 
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...Matthieu Schapranow
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineMatthieu Schapranow
 
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...Matthieu Schapranow
 
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...Matthieu Schapranow
 
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Matthieu Schapranow
 
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...Matthieu Schapranow
 
Festival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world ExamplesFestival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world ExamplesMatthieu Schapranow
 
Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Matthieu Schapranow
 
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...Matthieu Schapranow
 
Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?Matthieu Schapranow
 

Mais de Matthieu Schapranow (20)

Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
 
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
 
In-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineIn-Memory Apps for Precision Medicine
In-Memory Apps for Precision Medicine
 
"When time matters..."
"When time matters...""When time matters..."
"When time matters..."
 
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
 
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision Medicine
 
In-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineIn-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems Medicine
 
Analyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineAnalyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision Medicine
 
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchAnalyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
 
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision Medicine
 
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
 
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
 
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
 
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
 
Festival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world ExamplesFestival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world Examples
 
Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?
 
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
 
Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?
 

Último

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Último (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

High-Performance In-Memory Genome (HIG) Project

  • 1. HIG Project Overview August 31, 2012 Matthieu-P. Schapranow Hasso Plattner Institute Chair of Prof. Hasso Plattner
  • 2. Vision: Real-time Analysis of Genomic Data to Improve Medical Treatment 2 HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 3. Build up the Whole Picture out of Layers 3 ■  Data: □  Combine research findings from int’l scientific databases in single system at HPI ■  Platform: □  Expose information as a service to be consumed by special purpose applications ■  Applications: □  Support genome alignment pipeline processing by □  Massively parallel execute: □ Alignment algorithms, e.g. BWA, BT2, etc. □ Variant calling □  Analyze individual patient results (real-time annotations with combined data) □  Analyze patient cohorts using individual filters HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 4. How the Vision Becomes Real 4 ■  Platform: □  Worker Framework: Enables parallel execution of tasks (alignment, variant calling) across node limits □  Updating Framework: Retrieves periodic database updated of international databases and automatically integrates them into local store ■  Applications: □  Alignment Coordinator: Submit alignment tasks and retrieve mutation lists, e.g. CSV □  Genome Browser: Interactive browsing in reference and specific patient genomes HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 5. Alignment Coordinator 5 ■  Available Alignment Algorithms (and growing) □  Bowtie2 □  Bowtie □  BWA □  TMAP □  SNAP □  MAQ □  SOAP HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 6. Numbers you should know Alignment Execution Time 6 ■  One cell line ~600k reads / 110MB ■  Pipeline: Alignment and variant calling Property Traditional HPI Full Genome No Yes Cores 2 * 6 cores 25 * 40 cores Main Memory 48 GB 25 TB Runtime ~720 ~40s HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 7. Numbers you should know History of the Human Genome Project 7 ■  1984: Idea of a global Human Genome (HG) project discussed at Alta Summit: “DNA available on the Internet” ■  1990: HG project for 15 years started in the US (3 billion USD funding) ■  2000: Rough draft of the HG announced ■  2003: Complete genome sequenced ■  2006: Last and longest chr1 sequenced ■  … what’s next? HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 8. Numbers you should know Human Genome 8 Entity Cardinality Different Bases 4 (A,C,G,T) Base Pairs 3.137 Bbp Chromosomes 23 Distinct Genes 20k-25k Amino Acids 21 (coded as triplets) Proteins 50k-300k Taken from http://de.wikipedia.org/wiki/Code-Sonne HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 9. 9 Costs in USD 0,01 0,1 1 10 100 1000 10000 01.01.01 01.05.01 01.09.01 01.01.02 01.05.02 01.09.02 01.01.03 01.05.03 01.09.03 01.01.04 01.05.04 Comparison of Costs 01.09.04 01.01.05 Costs per Megabyte RAM 01.05.05 01.09.05 Numbers you should know HIG Project Overview, M. Schapranow, Aug 31, 2012 01.01.06 01.05.06 01.09.06 01.01.07 01.05.07 01.09.07 01.01.08 01.05.08 01.09.08 01.01.09 Costs per Megabase Sequencing 01.05.09 01.09.09 01.01.10 Comparison of Costs for Main Memory and Genome Analysis 01.05.10 01.09.10 01.01.11 01.05.11 01.09.11 01.01.12
  • 10. Hardware Characteristics 10 ■  1,000 core cluster, 25 TB main memory ■  Consists of 25 identical nodes: □  80 cores □  1 TB main memory □  Intel® Xeon® E7- 4870 □  2.40GHz □  30 MB Cache HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 11. Customer Process as of Today 11 ■  Tissue sequencing in context of cancer treatment ■  Complex, time-consuming, media breaks, manual steps HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 12. Project Objectives 12 ■  Alignment of DNA reads (FASTQ) against reference genome (FASTA) è mapped reads ■  Real-time analysis of mapped reads □  Detection of mutations (SNP, INDELs) □  Comparison of multiple tissues □  Detection of similar clusters to identify co-relations ■  Analysis of mutations □  Identify mutations with scientific references (existing knowledge) □  Detection of similar clusters to identify co-relations □  Identify genes and regulators for certain phenotypic characteristics, e.g. “fast running horses” HIG Project Overview, M. Schapranow, Aug 31, 2012
  • 13. Thank you for your interest! Keep in contact with us. 13 Matthieu-P. Schapranow, M.Sc. schapranow@hpi.uni-potsdam.de http://j.mp/schapranow Hasso Plattner Institute Enterprise Platform & Integration Concepts Matthieu-P. Schapranow August-Bebel-Str. 88 14482 Potsdam, Germany HIG Project Overview, M. Schapranow, Aug 31, 2012