SlideShare uma empresa Scribd logo
1 de 27
A Genome Sequence Analysis System Built with Hypertable Doug Judd CEO, Hypertable, Inc.
Application Development Team ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What is Hypertable? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hypertable Deployments
Why NoSQL?
Source:  Nature 458, 719-724 (2009)
Source: wired.com, February 2011
Genomics 101
Base Pair (aka “base”) ,[object Object],[object Object],[object Object],[object Object],[object Object]
Gene ,[object Object],[object Object],[object Object],[object Object],[object Object]
Biological Samples ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example Reads File GTGGATAGGGGGAGACTAATGTAGTATGATTATCATCATCAACAGAAGCTATGACACCAGGATAAA CATTTCTTATTGCTGAAAGTATTCTATTGTAGAGATGTACCACAATTTGGTTTCTGGTTTTGTATT GGGAGGATACTAGGGATTACTGAAGCCAACTTTGCAGACTCATACATTTGACTAGACACAGCC ACATTACAGTTTTCTGAGGAAAATTCTTAAGATGTTACCCCAAAACATAGCATTTTAAATTAAAAC GGACCGGCTGAAGCCATGGCAGAAGAACATAAATTGTGAAGATTTCATGGGCATTTATTAGTT GGAAGTGATAAGTGTCCATGAAATCTTCACAATTTATGTTCAGAGATTGCAGTAAAGACAGGTGTA AAGACACAGCAAAGCTAAGAGGACCCAACACACGGTAGGGTCGGGGACCTTGGAGAAACATGG TGGCTTCTTCCTACATGCTTGTGATAGATGACCAAAAAACATTTGTTGAGTTGATGAATAGTACAA AAAAGGGGCGGATAATAAATGAAAAGGGAATGTGCTGTTATTTCCTACTAAGATCAGAAAGAG ATATAAACAAAAGCTGTCATCACTTAGGGACTTCAGCCACATAAAACAATGTCAGGCTAGTCACTT AGAGCTTTGGGACTAGTTGAGTGGCAGCTTAACAAAGCAACGCAATATCCATAGGGATTGGGG ATATTTACATCTAGTGGATTCTACCAGTATGGTGGTCTTATGTGGACTGCACGTGGTTTTCTAGTA AGATAGCAGCTCTTCCCAAATTTATTTATAATTGTGGCATTATTTATAATATCAAAATATTAT GTTGCCAAAGGAGATTAACATTTGAGTCAGTGGGCGGGGTAAGGCCGACCTACCCTTAATCTGGTG GAGAAAGAAGCTGCTAATGGAGTTTAAAAGGTTACTGTCATTAATGAAAAATAAATTTACAGC CAGACATTTATGAACAGAAATGGGAAAAACACACTAGGAAAGCACTGCAAAGACTAATCTGTCTTT AAAGGAGATAGAGTGACTCCAGGCCCCTTAGAAATGACTATACCTGGCAGAGCATGCCAACTG ATGGGCTCGAGTCCTCACAAATATGAATTCCCCCTAAGTCTTGAGAGGTCATTTGTGCATTTGGAA GGAAGAACATTCCATGCTCATGGGTAGGAAGAATCAATATCGTGAAAATGGTCATACTGCCCA GCGGGGTTTTTTTTTGTTTCATATTAACTTTAAAGTAGTTTTTTTCCATTTTGTGAAGAAAGACAT AAAGAACCAAGGCTAATAGTTGTTTGAGTTGTACTTACCATGTTGTTAAATGTCACCTCACAC CGCTGCCAGCCTATCAGAGCCGGGAATTACACCGTGCTTGGAGTTCTGGCACAGATCCACAGCTAC AGTTCTTCATTGTAAGAAATGGATGCTAACATGTAACAAGAAAACATCTGAAGGTTAAACTCA AATAAATGGGTTAATAGTTTGTCTTTCGGTCTTCATACTTTCAATATAAGTGGTTTACTTAGCCGA
Sequence Alignment ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Taxonomy ,[object Object],[object Object],[object Object],[object Object],[object Object]
GenBank ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Schema Design
Taxa Table ,[object Object],[object Object],CREATE TABLE Taxa (ID, Type, Children, Name); /1 ID 1 /1 ID :fullName /root /1 Type no rank /1 Children 1,10239,12884,12908,28384,131567 /1 Name root /1/10239 ID 10239 /1/10239 ID :fullName /root/Viruses /1/10239 Type superkingdom /1/10239 Children 12333,12429,12877,29258,35237, … /1/10239 Name Viruses /1/10239/12333 ID 12333 /1/10239/12333 ID :fullName /root/Viruses/unclassified phages /1/10239/12333 Type no rank /1/10239/12333 Children 12340,12347,12366,12371,12374, … /1/10239/12333 Name unclassified phages
Reads Table ,[object Object],[object Object],CREATE TABLE Reads (Sequence, Quality, GeneKey, Comments); AbCam1_100_ACAGTG,HWI...56#ACAGTG/1  Sequence   ATCGCACCATTGAACTCCAGTC... AbCam1_100_ACAGTG,HWI...56#ACAGTG/1  Quality   eeaeeeede_Ycc]dcacab... AbCam1_100_ACAGTG,HWI...56#ACAGTG/1  Comments :qualityFilter  11071815... AbCam1_100_ACAGTG,HWI...56#ACAGTG/1  Sequence   GGCTTACGCCTGTAATCCCAGC... AbCam1_100_ACAGTG,HWI...56#ACAGTG/1  Quality   gfee_cgggegggecggggegc... AbCam1_100_ACAGTG,HWI...56#ACAGTG/1  GeneKey :gnl|GNOMON|1320663.m  11... AbCam1_100_ACAGTG,HWI...17#ACAGTG/1  Sequence   AGGATACGGAAGGCCCAAGGAG... AbCam1_100_ACAGTG,HWI...17#ACAGTG/1  Quality   cdd`dffffffgffgggegf^e... AbCam1_100_ACAGTG,HWI...17#ACAGTG/1  GeneKey :chr10  110718151643.1308... AbCam1_100_ACAGTG,HWI...80#ACAGTG/1  Sequence   ACGGAAGAGCACACGTCTGAAC... AbCam1_100_ACAGTG,HWI...80#ACAGTG/1  Quality   cbccb[^WUb]_b`_[bR_]... AbCam1_100_ACAGTG,HWI...80#ACAGTG/1  Comments :qualityFilter  11071815... AbCam1_100_ACAGTG,HWI...88#ACAGTG/1  Sequence   GAACTCCAGTCACACAGTGATC... AbCam1_100_ACAGTG,HWI...88#ACAGTG/1  Quality   eeeeeeeeeeeceeeeeaeeTQ... AbCam1_100_ACAGTG,HWI...88#ACAGTG/1  Comments :qualityFilter  11071815...
Genes Table ,[object Object],[object Object],CREATE TABLE Genes (Sequence, TaxID, ID, ReadID); 1000075  Sequence   GAATTCCATGGCAGTAAAACATCTTCCCTTC… 1000075  TaxID   9606 1000075  ID :name  HSLFBPS6 Human fructose-1,6-biphosphatase  1000075  ReadID :0310.Lane8big,HWI-EAS355:8:91:1231:1315#0/1 … 1000075  ReadID :0908.Mexus2.TATTAT,SCS:1:22:395:324#0/1_TA … 1000075  ReadID :0916.Enceph2,SCS:6:24:1519:513#0/1 1000075  ReadID :0916.Mexus,SCS:1:22:410:248#0/1 1000075  ReadID :0916.MonkeyAdeno,SCS:2:17:811:769#0/1 1000075  ReadID :0916.MonkeyAdeno,SCS:2:21:1132:1067#0/1 1000075  ReadID :0916.MonkeyAdeno,SCS:2:24:1207:492#0/1 1000075  ReadID :0916.MonkeyAdeno,SCS:2:33:1138:547#0/1 1000075  ReadID :0916.Parecho,SCS:3:4:679:1416#0/1|1 1000075  ReadID :HIV.HIV18_Lane7.s_7_sequence.AAA,SCS:7:30:688 … 1000075  ReadID :HIV.HIV18_Lane7.s_7_sequence.AAA,SCS:7:30:688 … 1000075  ReadID :HIV.HIV18_Lane7.s_7_sequence.unbiased,SCS:7:30 …
Monitoring Table Overview
Applications
Novel Virus Discovery ,[object Object],[object Object],[object Object],[object Object],[object Object]
Novel Virus Discovery Algorithm Detail ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Pathogen Discovery  in Cancer Samples ,[object Object],[object Object]
Taxonomic Tree Viewer ,[object Object],[object Object],[object Object]
Depletion Array (future) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The End Questions?

Mais conteúdo relacionado

Semelhante a A Genome Sequence Analysis System Built With Hypertable

2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
c.titus.brown
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
David Ruau
 
Practical 7 dna, rna and the flow of genetic information5
Practical 7 dna, rna and the flow of genetic information5Practical 7 dna, rna and the flow of genetic information5
Practical 7 dna, rna and the flow of genetic information5
Osama Barayan
 
Long vs short read sequencing. Long read sequencing technology is po.pdf
Long vs short read sequencing. Long read sequencing technology is po.pdfLong vs short read sequencing. Long read sequencing technology is po.pdf
Long vs short read sequencing. Long read sequencing technology is po.pdf
balrajashok
 

Semelhante a A Genome Sequence Analysis System Built With Hypertable (20)

Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRON
 
poster
posterposter
poster
 
NCBI
NCBINCBI
NCBI
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Biomart Update
Biomart UpdateBiomart Update
Biomart Update
 
MutaDATABASE
MutaDATABASEMutaDATABASE
MutaDATABASE
 
Bioinformatics t2-databases v2014
Bioinformatics t2-databases v2014Bioinformatics t2-databases v2014
Bioinformatics t2-databases v2014
 
Bioinformatica t2-databases
Bioinformatica t2-databasesBioinformatica t2-databases
Bioinformatica t2-databases
 
Practical 7 dna, rna and the flow of genetic information5
Practical 7 dna, rna and the flow of genetic information5Practical 7 dna, rna and the flow of genetic information5
Practical 7 dna, rna and the flow of genetic information5
 
2012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les12012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les1
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
 
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLERHPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysis
 
Long vs short read sequencing. Long read sequencing technology is po.pdf
Long vs short read sequencing. Long read sequencing technology is po.pdfLong vs short read sequencing. Long read sequencing technology is po.pdf
Long vs short read sequencing. Long read sequencing technology is po.pdf
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

A Genome Sequence Analysis System Built With Hypertable

  • 1. A Genome Sequence Analysis System Built with Hypertable Doug Judd CEO, Hypertable, Inc.
  • 2.
  • 3.
  • 6. Source: Nature 458, 719-724 (2009)
  • 9.
  • 10.
  • 11.
  • 12. Example Reads File GTGGATAGGGGGAGACTAATGTAGTATGATTATCATCATCAACAGAAGCTATGACACCAGGATAAA CATTTCTTATTGCTGAAAGTATTCTATTGTAGAGATGTACCACAATTTGGTTTCTGGTTTTGTATT GGGAGGATACTAGGGATTACTGAAGCCAACTTTGCAGACTCATACATTTGACTAGACACAGCC ACATTACAGTTTTCTGAGGAAAATTCTTAAGATGTTACCCCAAAACATAGCATTTTAAATTAAAAC GGACCGGCTGAAGCCATGGCAGAAGAACATAAATTGTGAAGATTTCATGGGCATTTATTAGTT GGAAGTGATAAGTGTCCATGAAATCTTCACAATTTATGTTCAGAGATTGCAGTAAAGACAGGTGTA AAGACACAGCAAAGCTAAGAGGACCCAACACACGGTAGGGTCGGGGACCTTGGAGAAACATGG TGGCTTCTTCCTACATGCTTGTGATAGATGACCAAAAAACATTTGTTGAGTTGATGAATAGTACAA AAAAGGGGCGGATAATAAATGAAAAGGGAATGTGCTGTTATTTCCTACTAAGATCAGAAAGAG ATATAAACAAAAGCTGTCATCACTTAGGGACTTCAGCCACATAAAACAATGTCAGGCTAGTCACTT AGAGCTTTGGGACTAGTTGAGTGGCAGCTTAACAAAGCAACGCAATATCCATAGGGATTGGGG ATATTTACATCTAGTGGATTCTACCAGTATGGTGGTCTTATGTGGACTGCACGTGGTTTTCTAGTA AGATAGCAGCTCTTCCCAAATTTATTTATAATTGTGGCATTATTTATAATATCAAAATATTAT GTTGCCAAAGGAGATTAACATTTGAGTCAGTGGGCGGGGTAAGGCCGACCTACCCTTAATCTGGTG GAGAAAGAAGCTGCTAATGGAGTTTAAAAGGTTACTGTCATTAATGAAAAATAAATTTACAGC CAGACATTTATGAACAGAAATGGGAAAAACACACTAGGAAAGCACTGCAAAGACTAATCTGTCTTT AAAGGAGATAGAGTGACTCCAGGCCCCTTAGAAATGACTATACCTGGCAGAGCATGCCAACTG ATGGGCTCGAGTCCTCACAAATATGAATTCCCCCTAAGTCTTGAGAGGTCATTTGTGCATTTGGAA GGAAGAACATTCCATGCTCATGGGTAGGAAGAATCAATATCGTGAAAATGGTCATACTGCCCA GCGGGGTTTTTTTTTGTTTCATATTAACTTTAAAGTAGTTTTTTTCCATTTTGTGAAGAAAGACAT AAAGAACCAAGGCTAATAGTTGTTTGAGTTGTACTTACCATGTTGTTAAATGTCACCTCACAC CGCTGCCAGCCTATCAGAGCCGGGAATTACACCGTGCTTGGAGTTCTGGCACAGATCCACAGCTAC AGTTCTTCATTGTAAGAAATGGATGCTAACATGTAACAAGAAAACATCTGAAGGTTAAACTCA AATAAATGGGTTAATAGTTTGTCTTTCGGTCTTCATACTTTCAATATAAGTGGTTTACTTAGCCGA
  • 13.
  • 14.
  • 15.
  • 17.
  • 18.
  • 19.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.

Notas do Editor

  1. Improvements in the rate of DNA sequencing over the past 30 years and into the future