SlideShare uma empresa Scribd logo
1 de 19
Comparison of Genomic DNA to
cDNA Alignment Methods
Miguel Galves and Zanoni Dias
Institute of Computing – Unicamp – Campinas – SP – Brazil
{miguel.galves,zanoni}@ic.unicamp.br
Scylla Bioinformatics – Campinas – SP – Brazil
{miguel,zanoni}@scylla.com.br
Agenda
 Introduction
 Problem
 Aligners
 Data set
 Subsets
 Evaluation Methods
 Results: Exact Alignments
 Results: EST Alignments
 Running Time Comparison
 Conclusions
Introduction
 Identifying genes in non-characterized DNA
sequences is one of the greatest challenges in
genomics
 EST-to-DNA alignment is one of the most common
methods
 EST are key to understanding the inner working of
an organism
– Human being has between 30000 and 35000 genes
– Alternative Splicing plays an important role in diversity
CCCGGGAAACGAAUAU CCUCUCACCCGGGA CUUGGCCCGGGAAACGAAUAU CCUCUCACCCGGG
A
CUUGG
Problem
Mature mRNA
mRNA
Intron
Exon
Problem: How to solve ?
 Classic algorithms
– Dynamic programming
 Heuristic based algorithms
– Multi-steps
– Based on other tools such as Blast and
local alignments.
Aligners
 Java version of global and semi-global
– Affine gap penalty function
– Linear space
– Global algorithm by Miller and Myers (1988)
– Semi-global based on global algorithm
 Heuristic based algorithms
– sim4, Spidey and est_genome
Data Set
 Human genome database
– Based on FASTA a GENBANK’s flat format file from
NCBI repository.
 Filtering criteria
– Genes, mRNAs and CDS with /pseudo tag
– mRNAs without any CDS
– Genes without any mRNA
– CDS matching wrong patterns
 23124 genes and 27448 mRNAs stored in database
Subsets
 Subset 1Subset 1:: 66 genes from chromossome Y whith
less than 100000 bases
 Subset 2: 50 complete genes from chromossome
Y whith less than 100000 bases
 Subset 3: 8056 complete genes from all
chromossomes whith less than 100000 bases
 Subset 4: 493 artificial EST based on complete
genes from chromossome 6 with less than
100000 bases
Evaluation methods
 Number of gaps introduced in the aligned
gene sequence
 Delta exons
 Bases similarity percentage
 Mismatch percentage
Experimental method
 Two score systems, from 15 previously
defined and an alignment strategy were
choosed, using subsets 1 and 2:
– Semi-global aligner
– (1,-2,-1,0) and (1,-2,-10,0) score systems
 The classic semi-global aligner was
compared to sim4, Spidey and est_genome,
both with subsets 3 and 4
Results: Exact Alignments
Extra Gap
Strategy Avg SD %Score 0
SG(1, -2, -1, 0) 0.00 0.00 100.00%
SG(1, -2, -10,
0)
0.00 0.00 100.00%
sim4 1.11 1.63 54.56%
est_genome 16.99 21.49 27.84%
Spidey 0.15 1.39 97.43%
Results: Exact Alignments
Delta Exons
Strategy Avg SD %Score 0
SG(1, -2, -1, 0) 0.00 0.00 100.00%
SG(1, -2, -10, 0) 0.01 0.07 99.91%
sim4 -0.01 0.20 97.46%
est_genome -0.14 0.30 76.79%
Spidey -4.04 3.10 0.00%
Results: Exact Alignments
Base Similarity
Strategy Avg SD %Scr. 100%
SG(1, -2, -1, 0) 99.89% 0.49% 53.56%
SG(1, -2, -10, 0) 99.89% 0.49% 53.49%
sim4 99.39% 1.34% 22.79%
est_genome 53.83% 35.00% 18.11%
Spidey 80.34% 36.49% 44.25%
Results: Exact Alignments
Mismatch Percentage
Strategy Avg SD %Scr. 100%
SG(1, -2, -1, 0) 0.00% 0.00% 100.00%
SG(1, -2, -10, 0) 0.01% 0.03% 99.47%
sim4 0.17% 0.21% 36.68%
est_genome 1.19% 1.26% 21.55%
Spidey 0.15% 0.98% 90.65%
Results: EST Alignments
Results: EST Alignments
Running Time Comparison
EST-to-DNA
(sec/alignment)
mRNA-toDNA
(sec/alignment)
sim4 0.013 0.170
Spidey 0.066 0.140
est_genome 0.640 3.400
Semi-global 0.670 5.170
Conclusions
 Classic semi-globl algorithm produces good
results
– Running time is a problem, although it can be
improved
 Sim4 produces the best results amont
external softwares tested
Thanks

Mais conteúdo relacionado

Semelhante a Comparison of Genomic DNA to cDNA Alignment Methods

PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...cscpconf
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..butest
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsNatalio Krasnogor
 
sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadfalizain9604
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшаваValeriya Simeonova
 
Shorter Multi-marker Signatures: a new tool to facilitate cancer diagnosis
Shorter Multi-marker Signatures:  a new tool to facilitate cancer diagnosisShorter Multi-marker Signatures:  a new tool to facilitate cancer diagnosis
Shorter Multi-marker Signatures: a new tool to facilitate cancer diagnosisdanieltm33
 
Shorter Multimarker signatures: a new tool to facilitate cancer diagnosis
Shorter Multimarker signatures:  a new tool to facilitate cancer diagnosisShorter Multimarker signatures:  a new tool to facilitate cancer diagnosis
Shorter Multimarker signatures: a new tool to facilitate cancer diagnosisdanieltm33
 
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
 ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO... ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...cscpconf
 
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...Arinze Akutekwe
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfH K Yoon
 
Network Biology Lent 2010 - lecture 1
Network Biology Lent 2010 - lecture 1Network Biology Lent 2010 - lecture 1
Network Biology Lent 2010 - lecture 1Florian Markowetz
 
Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...eSAT Journals
 
Comparative analysis of dynamic programming
Comparative analysis of dynamic programmingComparative analysis of dynamic programming
Comparative analysis of dynamic programmingeSAT Publishing House
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andAlexander Decker
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andAlexander Decker
 

Semelhante a Comparison of Genomic DNA to cDNA Alignment Methods (20)

PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric Bioinformatics
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшава
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Shorter Multi-marker Signatures: a new tool to facilitate cancer diagnosis
Shorter Multi-marker Signatures:  a new tool to facilitate cancer diagnosisShorter Multi-marker Signatures:  a new tool to facilitate cancer diagnosis
Shorter Multi-marker Signatures: a new tool to facilitate cancer diagnosis
 
Shorter Multimarker signatures: a new tool to facilitate cancer diagnosis
Shorter Multimarker signatures:  a new tool to facilitate cancer diagnosisShorter Multimarker signatures:  a new tool to facilitate cancer diagnosis
Shorter Multimarker signatures: a new tool to facilitate cancer diagnosis
 
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
 ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO... ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
 
Kishor Presentation
Kishor PresentationKishor Presentation
Kishor Presentation
 
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 
Network Biology Lent 2010 - lecture 1
Network Biology Lent 2010 - lecture 1Network Biology Lent 2010 - lecture 1
Network Biology Lent 2010 - lecture 1
 
Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...
 
Comparative analysis of dynamic programming
Comparative analysis of dynamic programmingComparative analysis of dynamic programming
Comparative analysis of dynamic programming
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning and
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning and
 
Discovery_Schreiner
Discovery_SchreinerDiscovery_Schreiner
Discovery_Schreiner
 

Mais de Miguel Galves

Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014
Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014
Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014Miguel Galves
 
Redis para iniciantes - TDC 2014
Redis para iniciantes - TDC 2014Redis para iniciantes - TDC 2014
Redis para iniciantes - TDC 2014Miguel Galves
 
New Strategy to detect SNPs
New Strategy to detect SNPsNew Strategy to detect SNPs
New Strategy to detect SNPsMiguel Galves
 
Qualificação de Mestrado
Qualificação de MestradoQualificação de Mestrado
Qualificação de MestradoMiguel Galves
 
Uma abordagem computacional para a determinação de polimorfismos de base única
Uma abordagem computacional para a determinação de polimorfismos de base únicaUma abordagem computacional para a determinação de polimorfismos de base única
Uma abordagem computacional para a determinação de polimorfismos de base únicaMiguel Galves
 
Django: Uso de frameworks ágeis para desenvolvimento web
Django: Uso de frameworks ágeis para desenvolvimento webDjango: Uso de frameworks ágeis para desenvolvimento web
Django: Uso de frameworks ágeis para desenvolvimento webMiguel Galves
 
Data Mining em redes sociais
Data Mining em redes sociaisData Mining em redes sociais
Data Mining em redes sociaisMiguel Galves
 

Mais de Miguel Galves (9)

Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014
Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014
Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014
 
Redis para iniciantes - TDC 2014
Redis para iniciantes - TDC 2014Redis para iniciantes - TDC 2014
Redis para iniciantes - TDC 2014
 
New Strategy to detect SNPs
New Strategy to detect SNPsNew Strategy to detect SNPs
New Strategy to detect SNPs
 
Qualificação de Mestrado
Qualificação de MestradoQualificação de Mestrado
Qualificação de Mestrado
 
Uma abordagem computacional para a determinação de polimorfismos de base única
Uma abordagem computacional para a determinação de polimorfismos de base únicaUma abordagem computacional para a determinação de polimorfismos de base única
Uma abordagem computacional para a determinação de polimorfismos de base única
 
Django: Uso de frameworks ágeis para desenvolvimento web
Django: Uso de frameworks ágeis para desenvolvimento webDjango: Uso de frameworks ágeis para desenvolvimento web
Django: Uso de frameworks ágeis para desenvolvimento web
 
GIS em 3 horas
GIS em 3 horasGIS em 3 horas
GIS em 3 horas
 
AJAX
AJAXAJAX
AJAX
 
Data Mining em redes sociais
Data Mining em redes sociaisData Mining em redes sociais
Data Mining em redes sociais
 

Último

Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Silpa
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLkantirani197
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Silpa
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxSilpa
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsbassianu17
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsSérgio Sacani
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.Silpa
 

Último (20)

Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 

Comparison of Genomic DNA to cDNA Alignment Methods

  • 1. Comparison of Genomic DNA to cDNA Alignment Methods Miguel Galves and Zanoni Dias Institute of Computing – Unicamp – Campinas – SP – Brazil {miguel.galves,zanoni}@ic.unicamp.br Scylla Bioinformatics – Campinas – SP – Brazil {miguel,zanoni}@scylla.com.br
  • 2. Agenda  Introduction  Problem  Aligners  Data set  Subsets  Evaluation Methods  Results: Exact Alignments  Results: EST Alignments  Running Time Comparison  Conclusions
  • 3. Introduction  Identifying genes in non-characterized DNA sequences is one of the greatest challenges in genomics  EST-to-DNA alignment is one of the most common methods  EST are key to understanding the inner working of an organism – Human being has between 30000 and 35000 genes – Alternative Splicing plays an important role in diversity
  • 4. CCCGGGAAACGAAUAU CCUCUCACCCGGGA CUUGGCCCGGGAAACGAAUAU CCUCUCACCCGGG A CUUGG Problem Mature mRNA mRNA Intron Exon
  • 5. Problem: How to solve ?  Classic algorithms – Dynamic programming  Heuristic based algorithms – Multi-steps – Based on other tools such as Blast and local alignments.
  • 6. Aligners  Java version of global and semi-global – Affine gap penalty function – Linear space – Global algorithm by Miller and Myers (1988) – Semi-global based on global algorithm  Heuristic based algorithms – sim4, Spidey and est_genome
  • 7. Data Set  Human genome database – Based on FASTA a GENBANK’s flat format file from NCBI repository.  Filtering criteria – Genes, mRNAs and CDS with /pseudo tag – mRNAs without any CDS – Genes without any mRNA – CDS matching wrong patterns  23124 genes and 27448 mRNAs stored in database
  • 8. Subsets  Subset 1Subset 1:: 66 genes from chromossome Y whith less than 100000 bases  Subset 2: 50 complete genes from chromossome Y whith less than 100000 bases  Subset 3: 8056 complete genes from all chromossomes whith less than 100000 bases  Subset 4: 493 artificial EST based on complete genes from chromossome 6 with less than 100000 bases
  • 9. Evaluation methods  Number of gaps introduced in the aligned gene sequence  Delta exons  Bases similarity percentage  Mismatch percentage
  • 10. Experimental method  Two score systems, from 15 previously defined and an alignment strategy were choosed, using subsets 1 and 2: – Semi-global aligner – (1,-2,-1,0) and (1,-2,-10,0) score systems  The classic semi-global aligner was compared to sim4, Spidey and est_genome, both with subsets 3 and 4
  • 11. Results: Exact Alignments Extra Gap Strategy Avg SD %Score 0 SG(1, -2, -1, 0) 0.00 0.00 100.00% SG(1, -2, -10, 0) 0.00 0.00 100.00% sim4 1.11 1.63 54.56% est_genome 16.99 21.49 27.84% Spidey 0.15 1.39 97.43%
  • 12. Results: Exact Alignments Delta Exons Strategy Avg SD %Score 0 SG(1, -2, -1, 0) 0.00 0.00 100.00% SG(1, -2, -10, 0) 0.01 0.07 99.91% sim4 -0.01 0.20 97.46% est_genome -0.14 0.30 76.79% Spidey -4.04 3.10 0.00%
  • 13. Results: Exact Alignments Base Similarity Strategy Avg SD %Scr. 100% SG(1, -2, -1, 0) 99.89% 0.49% 53.56% SG(1, -2, -10, 0) 99.89% 0.49% 53.49% sim4 99.39% 1.34% 22.79% est_genome 53.83% 35.00% 18.11% Spidey 80.34% 36.49% 44.25%
  • 14. Results: Exact Alignments Mismatch Percentage Strategy Avg SD %Scr. 100% SG(1, -2, -1, 0) 0.00% 0.00% 100.00% SG(1, -2, -10, 0) 0.01% 0.03% 99.47% sim4 0.17% 0.21% 36.68% est_genome 1.19% 1.26% 21.55% Spidey 0.15% 0.98% 90.65%
  • 17. Running Time Comparison EST-to-DNA (sec/alignment) mRNA-toDNA (sec/alignment) sim4 0.013 0.170 Spidey 0.066 0.140 est_genome 0.640 3.400 Semi-global 0.670 5.170
  • 18. Conclusions  Classic semi-globl algorithm produces good results – Running time is a problem, although it can be improved  Sim4 produces the best results amont external softwares tested