O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
“Creating a High Performance Cyberinfrastructure to
Support Analysis of Illumina Metagenomic Data”
DNA Day
Department of C...
The National Science Foundation
Has Funded Over 100 Campuses to Build “Big Data Freeways”
134 awards,
128 projects
- All b...
Creating a “Big Data” Plane on Campus:
NSF Funded Prism@UCSD and CHeruB
Prism@UCSD, Phil Papadopoulos, SDSC, Calit2, PI
CH...
SDSC Big Data Compute/Storage Facility -
Interconnected at Over 1 Tbps
128
COMET
VM SC 2 PF
128
Gordon
Big Data SC Oasis D...
Prism@UCSD Will Link Computational Mass Spectrometry
and Genome Sequencing Cores to the Big Data Freeway
ProteoSAFe: Compu...
IDI Enhanced Cyberinfrastructure
Supporting Knight Lab
FIONA
12 Cores/GPU
128 GB RAM
3.5 TB SSD
48TB Disk
10Gbps NIC
Knigh...
The Pacific Wave Platform
Creates a Regional Science-Driven “Big Data Freeway System”
Source:
John Hess, CENIC
Funded by N...
Coupling Supercomputing to Illumina Metagenomics Sequencing
5 Ileal Crohn’s Patients,
3 Points in Time
2 Ulcerative Coliti...
We Created a Reference Database
Of Known Gut Genomes
• NCBI April 2013
– 2471 Complete + 5543 Draft Bacteria & Archaea Gen...
Computational NextGen Sequencing Pipeline:
From Sequence to Taxonomy and Function
PI: (Weizhong Li, CRBS, UCSD):
NIH R01HG...
To Map Out the Dynamics of Autoimmune Microbiome Ecology
Couples Next Generation Genome Sequencers to Big Data Supercomput...
Next Step
Programmability, Scalability and Reproducibility using bioKepler
www.kepler-project.org
www.biokepler.org
Nation...
We Found Major State Shifts in Microbial Ecology Phyla
Between Healthy and Two Forms of IBD
Most
Common
Microbial
Phyla
Av...
Our Relative Abundance Results Across ~300 People
Reveal Potential Diagnostic Species
UC 100x Healthy
UC 100x CD
We Produc...
Dell Analytics Separates The 4 Patient Types in Our Data
Using Our Microbiome Species Data
Source: Thomas Hill, Ph.D.
Exec...
I Built on Dell Analytics to Show Dynamic Evolution of My Microbiome
Toward and Away from Healthy State – Colonic Crohn’s
...
Time Series Reveals Oscillations in Immune Biomarkers
Associated with Time Progression of Autoimmune Disease
Immune &
Infl...
UC San Diego Will Be Carrying Out
a Major Clinical Study of IBD Using These Techniques
Inflammatory Bowel Disease Biobank
...
Next Step
Knight/Smarr Lab Collaboration
• Smarr Gut Microbiome Time Series
– From 7 to 50 Times Over Four Years
• Healthy...
Próximos SlideShares
Carregando em…5
×

Creating a High Performance Cyberinfrastructure to Support Analysis of Illumina Metagenomic Data

364 visualizações

Publicada em

DNA Day, Department of Computer Science and Engineering, University of California, San Diego ,September 16, 2015

Publicada em: Dados e análise
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Creating a High Performance Cyberinfrastructure to Support Analysis of Illumina Metagenomic Data

  1. 1. “Creating a High Performance Cyberinfrastructure to Support Analysis of Illumina Metagenomic Data” DNA Day Department of Computer Science and Engineering University of California, San Diego September 16, 2015 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD http://lsmarr.calit2.net 1
  2. 2. The National Science Foundation Has Funded Over 100 Campuses to Build “Big Data Freeways” 134 awards, 128 projects - All but 4 states - 120+ institutions
  3. 3. Creating a “Big Data” Plane on Campus: NSF Funded Prism@UCSD and CHeruB Prism@UCSD, Phil Papadopoulos, SDSC, Calit2, PI CHERuB, Mike Norman, SDSC PI CHERuB
  4. 4. SDSC Big Data Compute/Storage Facility - Interconnected at Over 1 Tbps 128 COMET VM SC 2 PF 128 Gordon Big Data SC Oasis Data Store • 128 Source: Philip Papadopoulos, SDSC/Calit2 Arista Router Can Switch 576 10Gps Light Paths 6000 TB > 800 Gbps # of Parallel 10Gbps Optical Light Paths 128 x 10Gbps = 1.3TbpsSDSC Supercomputers
  5. 5. Prism@UCSD Will Link Computational Mass Spectrometry and Genome Sequencing Cores to the Big Data Freeway ProteoSAFe: Compute-intensive discovery MS at the click of a button MassIVE: repository and identification platform for all MS data in the world Source: proteomics.ucsd.edu
  6. 6. IDI Enhanced Cyberinfrastructure Supporting Knight Lab FIONA 12 Cores/GPU 128 GB RAM 3.5 TB SSD 48TB Disk 10Gbps NIC Knight Lab 10Gbps Gordon Prism@UCSD Data Oasis 7.5PB, 100GB/s Knight 1024 Cluster In SDSC Co-Lo CHERuB 100Gbps Emperor & Other Vis Tools 64Mpixel Data Analysis Wall 120Gbps 40Gbps
  7. 7. The Pacific Wave Platform Creates a Regional Science-Driven “Big Data Freeway System” Source: John Hess, CENIC Funded by NSF $5M Oct 2015-2020 Flash Disk to Flash Disk File Transfer Rate
  8. 8. Coupling Supercomputing to Illumina Metagenomics Sequencing 5 Ileal Crohn’s Patients, 3 Points in Time 2 Ulcerative Colitis Patients, 6 Points in Time “Healthy” Individuals Source: Jerry Sheehan, Calit2 Weizhong Li, Sitao Wu, CRBS, UCSD Total of 27 Billion Reads Or 2.7 Trillion Bases Inflammatory Bowel Disease (IBD) Patients 250 Subjects 1 Point in Time 7 Points in Time Each Sample Has 100-200 Million Illumina Short Reads (100 bases) Larry Smarr (Colonic Crohn’s)
  9. 9. We Created a Reference Database Of Known Gut Genomes • NCBI April 2013 – 2471 Complete + 5543 Draft Bacteria & Archaea Genomes – 2399 Complete Virus Genomes – 26 Complete Fungi Genomes – 309 HMP Eukaryote Reference Genomes • Total 10,741 genomes, ~30 GB of sequences Now to Align Our 27 Billion Reads Against the Reference Database Source: Weizhong Li, Sitao Wu, CRBS, UCSD
  10. 10. Computational NextGen Sequencing Pipeline: From Sequence to Taxonomy and Function PI: (Weizhong Li, CRBS, UCSD): NIH R01HG005978 (2010-2013, $1.1M)
  11. 11. To Map Out the Dynamics of Autoimmune Microbiome Ecology Couples Next Generation Genome Sequencers to Big Data Supercomputers Source: Weizhong Li, UCSD Our Team Used 25 CPU-years to Compute Comparative Gut Microbiomes Starting From 2.7 Trillion DNA Bases of My Samples and Healthy and IBD Controls Illumina HiSeq 2000 at JCVI SDSC Gordon Data Supercomputer
  12. 12. Next Step Programmability, Scalability and Reproducibility using bioKepler www.kepler-project.org www.biokepler.org National Resources (Gordon) (Comet) (Stampede)(Lonestar) Cloud Resources Optimized Local Cluster Resources Source: Ilkay Altintas, SDSC
  13. 13. We Found Major State Shifts in Microbial Ecology Phyla Between Healthy and Two Forms of IBD Most Common Microbial Phyla Average HE Average Ulcerative Colitis Average LS Average Crohn’s Disease Collapse of Bacteroidetes Explosion of Actinobacteria Explosion of Proteobacteria Hybrid of UC and CD High Level of Archaea
  14. 14. Our Relative Abundance Results Across ~300 People Reveal Potential Diagnostic Species UC 100x Healthy UC 100x CD We Produced Similar Results for ~2500 Microbial Species Healthy 100x CD
  15. 15. Dell Analytics Separates The 4 Patient Types in Our Data Using Our Microbiome Species Data Source: Thomas Hill, Ph.D. Executive Director Analytics Dell | Information Management Group, Dell Software Healthy Ulcerative Colitis Colonic Crohn’s Ileal Crohn’s
  16. 16. I Built on Dell Analytics to Show Dynamic Evolution of My Microbiome Toward and Away from Healthy State – Colonic Crohn’s Healthy Ileal Crohn’s Seven Time Samples Over 1.5 Years Colonic Crohn’s
  17. 17. Time Series Reveals Oscillations in Immune Biomarkers Associated with Time Progression of Autoimmune Disease Immune & Inflammation Variables Weekly Symptoms Pharma Therapies Stool Samples 2009 20142013201220112010 2015
  18. 18. UC San Diego Will Be Carrying Out a Major Clinical Study of IBD Using These Techniques Inflammatory Bowel Disease Biobank For Healthy and Disease Patients Drs. William J. Sandborn, John Chang, & Brigid Boland UCSD School of Medicine, Division of Gastroenterology Over 200 Enrolled Announced November 7, 2014
  19. 19. Next Step Knight/Smarr Lab Collaboration • Smarr Gut Microbiome Time Series – From 7 to 50 Times Over Four Years • Healthy Human Microbiome – Use 255+ Raw Reads from NIH Human Microbiome Project • IBD Patients: From 5 Crohn’s Disease and 2 Ulcerative Colitis Patients to ~100 – 50 Carefully Phenotyped Patients Drawn from Sandborn BioBank – 43 Metagenomes from the RISK Cohort of Newly Diagnosed IBD patients, • Illumina Reagent Grant Key – Enables Deep Metagenomic (and 16S) Sequencing at IGM of Smarr + Sandborn Samples • New Software Suite from Knight Lab – Major Re-annotation of Reference Genomes, Functional and Taxonomic Variations – Novel Assembly Algorithms from Pavel Pevzner-Very Computationally Intensive – See Talk Later This Morning • Supercomputer Grant On SDSC Comet (Awarded from XSEDE) – From 25 Gordon to 100 Comet Core-Years – Each Comet Core 40GF Peak=2x Gordon Core: 8X Increase in Compute

×