SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
New Strategy to detect SNPs
Miguel Galves
José Augusto Quitzau
Zanoni Dias
Scylla Bioinformatics –Brazil
{miguel,jquitzau,zanoni}@scylla.com.br
Agenda
 Introduction
 HIV Dataset
 Detection Strategy
 Trimming Procedure
 Base-Calling Strategies
 Filter Algorithm
 Consensus Algorithm
 Tests Protocol
 Results
 Discussion
Introduction
 Polymorphism: set of base pair locus at
which different alleles exists in individuals in
some population
– The second most frequent allele must appear in
at least 1% of the individuals
 SNP: polymorphism in a single base pair
position
 SNP discovery is very important to
understand complex diseases
HIV Dataset
 HIV genetic sequences:
– 1302 bp
– Well-conserved region
 35 batches from 35 individuals:
– 6 PCR reads, with average size of 690bp
– 1 validated sequence, with manually annotated
SNPs
 HIV Reference Sequence
Detection Strategy: Survey
 Trimming Procedure
 Base-Calling Correction
 SNPs Filter
 Batch Consensus Algorithm
Trimming Procedure
 Low Quality Ends filtering
 Converts phred’s quality sequence to error
probability sequence:
⇒ Q = -10 x log10(p)
 Subtract 0.05 from all values (Q=13)
 Maximum Score Subsequence Algorithm
Base Calling: Area Ratio
 The base calling is made in 5 Steps:
1. Chromatogram area delimitation
2. Peak search
3. Choice of the nearest peaks
4. Calculation of the nearest peaks area
5. Calculation of the polymorphic/reference peak area
 If the calculated ratio is above a certain threshold, the
point is considered a polymorphism.
Base Calling: Area Delimitation
Base Calling: Peak Identification
Base Calling: Average Height Ratio
 Almost the same steps:
1. Chromatogram area delimitation
2. Peak search
3. Choice of the nearest peaks
4. Calculation of the nearest peaks average height
5. Calculation of the polymorphic/reference peak average
height.
 Again, if the calculated ratio is above a certain
threshold, the point is considered a polymorphism.
Base Calling: Peak Identification
Filter Algorithm
 Analyzes each sequence
 Uses a window based algorithm to eliminate
adjacents SNPs
– Window size: 11 bases
– Empirical score system assigned to polymorphism
in the window
Consensus Algorithm
 Rule-based algorithm
– Empirical rules
 Analyzes the whole cross section to define a
consensus
– Take account of nucleotide frequencies and
qualities
 Do not create N symbols, nor tri-allelic
polymorphisms.
Consensus Algorithm: Example
Sequence 1 A25 C30 C18 C30 A21
Sequence 2 A30 C25 C15 C25 A16
Sequence 3 - M18 A9 C30 -
Sequence 4 - - S12 G17 T18
Consensus A M S S W
Tests Protocol: Third Party Packages
 Two external packages used to compare our results:
– Polybayes: SNP detection tool based on Bayesian
Methods
– Polyphred: SNP detection tool based on chromatogram
analysis
 ACE file (contig and consensus) created for each
batch using phrap
 ACE file analyzed by Polyphred and Polybayes
 Results viewed with consed
Tests Protocol: Our strategy
 Reads trimmed using Maximum
Subsequence Algorithm
 Base-calling analysis and correction using
algorithms describe previously
 SNP filtering
 Multiple alignment
– Reference sequence as anchor
 Consensus creation
Third Party Results: Polybayes
 Polybayes detected SNPs in only 2 batches out of 35
Batch Existing
SNPs
Detected
SNPs
Correct
SNPs
False
Positives
False
Negatives
Batch 13 12 1 1 0 11
Batch 15 5 1 0 1 5
Third Party Results: Polyphred
 Polyphred detected SNPs in only 4 batches out of 35
Batch Existing
SNPs
Detected
SNPs
Correct
SNPs
False
Positives
False
Negatives
Batch 07 10 1 0 1 10
Batch 14 4 3 0 3 4
Batch 32 26 1 0 1 26
Batch 35 15 8 1 7 14
Trimming Results
 Reads average size:
– Before trimming: 690.15bp
– After trimming: 374.74bp
– Reduction of 45%
 Reference sequence average base coverage
– Before trimming: 2.69
– After trimming: 1.77
Results: True Positive (%) x batch
Results: False Negative (%) x batch
Results: False Positive (%) x batch
Results: Summary
Polybayes Polyphred Area Avg. Height
Avg SD Avg SD Avg SD Avg SD
TP 0.3 1.4 0.2 1.1 75.4 19.2 52.6 21.5
FN 99.7 1.4 99.8 1.1 23.2 18.4 45.6 21.7
DP 0.0 0.0 0.0 0.0 1.4 4.3 1.8 4.0
FP 2.9 16.9 11.1 31.3 393.9 312.3 554.4 511.3
TP + FN + DP = 100%
Discussion
 Polybayes and Polyphred need large sets of data to
produces good results
 Our algorithm produces quite satisfactory results
taking into account data characteristics:
– Low average coverage
– High amount of low quality bases
– High amount of polymorphisms (virus DNA)
 Area Ratio strategy produces better results than
Average Height strategy
Future Work
 Test the algorithms whith larger batches,
whith higher average coverage, to improve
consensus algorithm
 Reproduce the experiments using genetic
sequences of more conserved life forms,
such as mammals
Acknowledgments

Mais conteúdo relacionado

Mais procurados

Protein microarray Preparation of protein microarray Different methods of arr...
Protein microarray Preparation of protein microarray Different methods of arr...Protein microarray Preparation of protein microarray Different methods of arr...
Protein microarray Preparation of protein microarray Different methods of arr...naveed ul mushtaq
 
Protein micro array
Protein micro arrayProtein micro array
Protein micro arraykrupa sagar
 
(050407)protein chip
(050407)protein chip(050407)protein chip
(050407)protein chipnamvgta
 
Digiwest journa club presentation_18.10.2016
Digiwest journa club presentation_18.10.2016Digiwest journa club presentation_18.10.2016
Digiwest journa club presentation_18.10.2016Dhirend N. Singh
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotationScott Dawson
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminarVarsha Gayatonde
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsajay301
 
Genotyping, linkage mapping and binary data
Genotyping, linkage mapping and binary dataGenotyping, linkage mapping and binary data
Genotyping, linkage mapping and binary dataFAO
 
Use of SNP-HapMaps in plant breeding
Use of SNP-HapMaps in plant breeding Use of SNP-HapMaps in plant breeding
Use of SNP-HapMaps in plant breeding Anilkumar C
 
Protein microarray
Protein microarrayProtein microarray
Protein microarrayGhalia Nawal
 
Pooled Sequence Haplotype Estimator
Pooled Sequence Haplotype EstimatorPooled Sequence Haplotype Estimator
Pooled Sequence Haplotype EstimatorDevin Petersohn
 
Candidate Gene Approach in Crop Improvement
Candidate Gene Approach in Crop ImprovementCandidate Gene Approach in Crop Improvement
Candidate Gene Approach in Crop ImprovementBonipasAntony2
 
Gene Expression Data Analysis
Gene Expression Data AnalysisGene Expression Data Analysis
Gene Expression Data AnalysisJhoirene Clemente
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionAashish Patel
 
Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...Ronak Shah
 
Microarray and its application
Microarray and its applicationMicroarray and its application
Microarray and its applicationprateek kumar
 

Mais procurados (20)

Protein microarray Preparation of protein microarray Different methods of arr...
Protein microarray Preparation of protein microarray Different methods of arr...Protein microarray Preparation of protein microarray Different methods of arr...
Protein microarray Preparation of protein microarray Different methods of arr...
 
Protein micro array
Protein micro arrayProtein micro array
Protein micro array
 
(050407)protein chip
(050407)protein chip(050407)protein chip
(050407)protein chip
 
15 arrays
15 arrays15 arrays
15 arrays
 
Digiwest journa club presentation_18.10.2016
Digiwest journa club presentation_18.10.2016Digiwest journa club presentation_18.10.2016
Digiwest journa club presentation_18.10.2016
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotation
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminar
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
PROTEIN MICROARRAYS
PROTEIN MICROARRAYSPROTEIN MICROARRAYS
PROTEIN MICROARRAYS
 
Genotyping, linkage mapping and binary data
Genotyping, linkage mapping and binary dataGenotyping, linkage mapping and binary data
Genotyping, linkage mapping and binary data
 
Use of SNP-HapMaps in plant breeding
Use of SNP-HapMaps in plant breeding Use of SNP-HapMaps in plant breeding
Use of SNP-HapMaps in plant breeding
 
Protein microarray
Protein microarrayProtein microarray
Protein microarray
 
Pooled Sequence Haplotype Estimator
Pooled Sequence Haplotype EstimatorPooled Sequence Haplotype Estimator
Pooled Sequence Haplotype Estimator
 
Techniques in proteomics
Techniques in proteomicsTechniques in proteomics
Techniques in proteomics
 
Candidate Gene Approach in Crop Improvement
Candidate Gene Approach in Crop ImprovementCandidate Gene Approach in Crop Improvement
Candidate Gene Approach in Crop Improvement
 
Gene Expression Data Analysis
Gene Expression Data AnalysisGene Expression Data Analysis
Gene Expression Data Analysis
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene Expression
 
Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...Developing a framework for for detection of low frequency somatic genetic alt...
Developing a framework for for detection of low frequency somatic genetic alt...
 
Analysis of gene expression
Analysis of gene expressionAnalysis of gene expression
Analysis of gene expression
 
Microarray and its application
Microarray and its applicationMicroarray and its application
Microarray and its application
 

Destaque

2014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture22014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture2Thomas Keane
 
Single nucleotide polymorphism
Single nucleotide polymorphismSingle nucleotide polymorphism
Single nucleotide polymorphismBipul Das
 
Non-synonymous SNP ID
Non-synonymous SNP IDNon-synonymous SNP ID
Non-synonymous SNP IDcgstorer
 
Over- and Under-methylation in the psychiatric population ppt_as_pdf
Over- and Under-methylation in the psychiatric population ppt_as_pdfOver- and Under-methylation in the psychiatric population ppt_as_pdf
Over- and Under-methylation in the psychiatric population ppt_as_pdfJennifer Spencer
 
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...Thermo Fisher Scientific
 
L11 dna__polymorphisms__mutations_and_genetic_diseases4
L11  dna__polymorphisms__mutations_and_genetic_diseases4L11  dna__polymorphisms__mutations_and_genetic_diseases4
L11 dna__polymorphisms__mutations_and_genetic_diseases4MUBOSScz
 
Genotyping by Sequencing
Genotyping by SequencingGenotyping by Sequencing
Genotyping by SequencingSenthil Natesan
 
Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Karan Veer Singh
 

Destaque (15)

2014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture22014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
 
Snp
SnpSnp
Snp
 
Single nucleotide polymorphism
Single nucleotide polymorphismSingle nucleotide polymorphism
Single nucleotide polymorphism
 
7 0
7 07 0
7 0
 
Non-synonymous SNP ID
Non-synonymous SNP IDNon-synonymous SNP ID
Non-synonymous SNP ID
 
Over- and Under-methylation in the psychiatric population ppt_as_pdf
Over- and Under-methylation in the psychiatric population ppt_as_pdfOver- and Under-methylation in the psychiatric population ppt_as_pdf
Over- and Under-methylation in the psychiatric population ppt_as_pdf
 
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
 
L11 dna__polymorphisms__mutations_and_genetic_diseases4
L11  dna__polymorphisms__mutations_and_genetic_diseases4L11  dna__polymorphisms__mutations_and_genetic_diseases4
L11 dna__polymorphisms__mutations_and_genetic_diseases4
 
Genome wide association mapping
Genome wide association mappingGenome wide association mapping
Genome wide association mapping
 
SNP
SNPSNP
SNP
 
Genotyping by Sequencing
Genotyping by SequencingGenotyping by Sequencing
Genotyping by Sequencing
 
Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,
 
SNP
SNPSNP
SNP
 
Genetic polymorphism
Genetic polymorphismGenetic polymorphism
Genetic polymorphism
 
Polymorphism
PolymorphismPolymorphism
Polymorphism
 

Semelhante a New Strategy to detect SNPs

SNP genotyping on qPCR platforms: Troubleshooting for amplification and clust...
SNP genotyping on qPCR platforms: Troubleshooting for amplification and clust...SNP genotyping on qPCR platforms: Troubleshooting for amplification and clust...
SNP genotyping on qPCR platforms: Troubleshooting for amplification and clust...Integrated DNA Technologies
 
Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...
Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...
Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...QIAGEN
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Thermo Fisher Scientific
 
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.QIAGEN
 
TIS prediction in human cDNAs with high accuracy
TIS prediction in human cDNAs with high accuracyTIS prediction in human cDNAs with high accuracy
TIS prediction in human cDNAs with high accuracyAnax Fotopoulos
 
Golden Gate technology an efficient technology for performing Association stu...
Golden Gate technology an efficient technology for performing Association stu...Golden Gate technology an efficient technology for performing Association stu...
Golden Gate technology an efficient technology for performing Association stu...Pierre Lindenbaum
 
The OncoScan(TM) platform for analysis of copy number and somatic mutations i...
The OncoScan(TM) platform for analysis of copy number and somatic mutations i...The OncoScan(TM) platform for analysis of copy number and somatic mutations i...
The OncoScan(TM) platform for analysis of copy number and somatic mutations i...Lawrence Greenfield
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшаваValeriya Simeonova
 
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031Thermo Fisher Scientific
 
Cnv and a analysis strategies
Cnv and a analysis strategiesCnv and a analysis strategies
Cnv and a analysis strategiesElsa von Licy
 
Validation of Identity and Ancestry SNP Panels for the Ion PGM™ System
Validation of Identity and Ancestry SNP Panels for the Ion PGM™ SystemValidation of Identity and Ancestry SNP Panels for the Ion PGM™ System
Validation of Identity and Ancestry SNP Panels for the Ion PGM™ SystemThermo Fisher Scientific
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPatricia Francis-Lyon
 
Principle, Procedure and applications of Digital PCR.pptx
Principle, Procedure  and applications of Digital PCR.pptxPrinciple, Procedure  and applications of Digital PCR.pptx
Principle, Procedure and applications of Digital PCR.pptxVikramadityaupmanyu
 
Apac distributor training series 3 swift product for cancer study
Apac distributor training series 3  swift product for cancer studyApac distributor training series 3  swift product for cancer study
Apac distributor training series 3 swift product for cancer studySwift Biosciences
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0Computer Science Club
 

Semelhante a New Strategy to detect SNPs (20)

SNP genotyping on qPCR platforms: Troubleshooting for amplification and clust...
SNP genotyping on qPCR platforms: Troubleshooting for amplification and clust...SNP genotyping on qPCR platforms: Troubleshooting for amplification and clust...
SNP genotyping on qPCR platforms: Troubleshooting for amplification and clust...
 
Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...
Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...
Advanced miRNA Expression Analysis: miRNA and its Role in Human Disease Webin...
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
 
Cn presentation
Cn presentationCn presentation
Cn presentation
 
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
 
TIS prediction in human cDNAs with high accuracy
TIS prediction in human cDNAs with high accuracyTIS prediction in human cDNAs with high accuracy
TIS prediction in human cDNAs with high accuracy
 
Golden Gate technology an efficient technology for performing Association stu...
Golden Gate technology an efficient technology for performing Association stu...Golden Gate technology an efficient technology for performing Association stu...
Golden Gate technology an efficient technology for performing Association stu...
 
The OncoScan(TM) platform for analysis of copy number and somatic mutations i...
The OncoScan(TM) platform for analysis of copy number and somatic mutations i...The OncoScan(TM) platform for analysis of copy number and somatic mutations i...
The OncoScan(TM) platform for analysis of copy number and somatic mutations i...
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшава
 
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031
Limit of Detection of Rare Targets Using Digital PCR | ESHG 2015 Poster PS14.031
 
Ngs webinar 2013
Ngs webinar 2013Ngs webinar 2013
Ngs webinar 2013
 
Cnv and a analysis strategies
Cnv and a analysis strategiesCnv and a analysis strategies
Cnv and a analysis strategies
 
Validation of Identity and Ancestry SNP Panels for the Ion PGM™ System
Validation of Identity and Ancestry SNP Panels for the Ion PGM™ SystemValidation of Identity and Ancestry SNP Panels for the Ion PGM™ System
Validation of Identity and Ancestry SNP Panels for the Ion PGM™ System
 
2012 predictive clusters
2012 predictive clusters2012 predictive clusters
2012 predictive clusters
 
Pcr array 2013
Pcr array 2013Pcr array 2013
Pcr array 2013
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learning
 
Technical Tips for qPCR
Technical Tips for qPCRTechnical Tips for qPCR
Technical Tips for qPCR
 
Principle, Procedure and applications of Digital PCR.pptx
Principle, Procedure  and applications of Digital PCR.pptxPrinciple, Procedure  and applications of Digital PCR.pptx
Principle, Procedure and applications of Digital PCR.pptx
 
Apac distributor training series 3 swift product for cancer study
Apac distributor training series 3  swift product for cancer studyApac distributor training series 3  swift product for cancer study
Apac distributor training series 3 swift product for cancer study
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0
 

Mais de Miguel Galves

Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014
Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014
Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014Miguel Galves
 
Redis para iniciantes - TDC 2014
Redis para iniciantes - TDC 2014Redis para iniciantes - TDC 2014
Redis para iniciantes - TDC 2014Miguel Galves
 
Comparison of Genomic DNA to cDNA Alignment Methods
Comparison of Genomic DNA to cDNA Alignment MethodsComparison of Genomic DNA to cDNA Alignment Methods
Comparison of Genomic DNA to cDNA Alignment MethodsMiguel Galves
 
Qualificação de Mestrado
Qualificação de MestradoQualificação de Mestrado
Qualificação de MestradoMiguel Galves
 
Uma abordagem computacional para a determinação de polimorfismos de base única
Uma abordagem computacional para a determinação de polimorfismos de base únicaUma abordagem computacional para a determinação de polimorfismos de base única
Uma abordagem computacional para a determinação de polimorfismos de base únicaMiguel Galves
 
Django: Uso de frameworks ágeis para desenvolvimento web
Django: Uso de frameworks ágeis para desenvolvimento webDjango: Uso de frameworks ágeis para desenvolvimento web
Django: Uso de frameworks ágeis para desenvolvimento webMiguel Galves
 
Data Mining em redes sociais
Data Mining em redes sociaisData Mining em redes sociais
Data Mining em redes sociaisMiguel Galves
 

Mais de Miguel Galves (9)

Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014
Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014
Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014
 
Redis para iniciantes - TDC 2014
Redis para iniciantes - TDC 2014Redis para iniciantes - TDC 2014
Redis para iniciantes - TDC 2014
 
Comparison of Genomic DNA to cDNA Alignment Methods
Comparison of Genomic DNA to cDNA Alignment MethodsComparison of Genomic DNA to cDNA Alignment Methods
Comparison of Genomic DNA to cDNA Alignment Methods
 
Qualificação de Mestrado
Qualificação de MestradoQualificação de Mestrado
Qualificação de Mestrado
 
Uma abordagem computacional para a determinação de polimorfismos de base única
Uma abordagem computacional para a determinação de polimorfismos de base únicaUma abordagem computacional para a determinação de polimorfismos de base única
Uma abordagem computacional para a determinação de polimorfismos de base única
 
Django: Uso de frameworks ágeis para desenvolvimento web
Django: Uso de frameworks ágeis para desenvolvimento webDjango: Uso de frameworks ágeis para desenvolvimento web
Django: Uso de frameworks ágeis para desenvolvimento web
 
GIS em 3 horas
GIS em 3 horasGIS em 3 horas
GIS em 3 horas
 
AJAX
AJAXAJAX
AJAX
 
Data Mining em redes sociais
Data Mining em redes sociaisData Mining em redes sociais
Data Mining em redes sociais
 

Último

Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxpriyankatabhane
 
Food_safety_Management_pptx.pptx in microbiology
Food_safety_Management_pptx.pptx in microbiologyFood_safety_Management_pptx.pptx in microbiology
Food_safety_Management_pptx.pptx in microbiologyHemantThakare8
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx201bo007
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxpriyankatabhane
 
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaEGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaDr.Mahmoud Abbas
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPRPirithiRaju
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
Unit-V-Introduction to Data Mining.pptx
Unit-V-Introduction to  Data Mining.pptxUnit-V-Introduction to  Data Mining.pptx
Unit-V-Introduction to Data Mining.pptxHarsha Patel
 
Introduction of Organ-On-A-Chip - Creative Biolabs
Introduction of Organ-On-A-Chip - Creative BiolabsIntroduction of Organ-On-A-Chip - Creative Biolabs
Introduction of Organ-On-A-Chip - Creative BiolabsCreative-Biolabs
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxpriyankatabhane
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsDanielBaumann11
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Christina Parmionova
 
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...Chayanika Das
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxGiDMOh
 
Understanding Nutrition, 16th Edition pdf
Understanding Nutrition, 16th Edition pdfUnderstanding Nutrition, 16th Edition pdf
Understanding Nutrition, 16th Edition pdfHabibouKarbo
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
Environment modelling and its environmental aspects
Environment modelling and its environmental aspectsEnvironment modelling and its environmental aspects
Environment modelling and its environmental aspectsMansi Rastogi
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxPayal Shrivastava
 
Production technology of Brinjal -Solanum melongena
Production technology of Brinjal -Solanum melongenaProduction technology of Brinjal -Solanum melongena
Production technology of Brinjal -Solanum melongenajana861314
 

Último (20)

Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptx
 
Food_safety_Management_pptx.pptx in microbiology
Food_safety_Management_pptx.pptx in microbiologyFood_safety_Management_pptx.pptx in microbiology
Food_safety_Management_pptx.pptx in microbiology
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
 
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaEGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
Unit-V-Introduction to Data Mining.pptx
Unit-V-Introduction to  Data Mining.pptxUnit-V-Introduction to  Data Mining.pptx
Unit-V-Introduction to Data Mining.pptx
 
Introduction of Organ-On-A-Chip - Creative Biolabs
Introduction of Organ-On-A-Chip - Creative BiolabsIntroduction of Organ-On-A-Chip - Creative Biolabs
Introduction of Organ-On-A-Chip - Creative Biolabs
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptx
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
 
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptx
 
PLASMODIUM. PPTX
PLASMODIUM. PPTXPLASMODIUM. PPTX
PLASMODIUM. PPTX
 
Understanding Nutrition, 16th Edition pdf
Understanding Nutrition, 16th Edition pdfUnderstanding Nutrition, 16th Edition pdf
Understanding Nutrition, 16th Edition pdf
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
Environment modelling and its environmental aspects
Environment modelling and its environmental aspectsEnvironment modelling and its environmental aspects
Environment modelling and its environmental aspects
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptx
 
Production technology of Brinjal -Solanum melongena
Production technology of Brinjal -Solanum melongenaProduction technology of Brinjal -Solanum melongena
Production technology of Brinjal -Solanum melongena
 

New Strategy to detect SNPs

  • 1. New Strategy to detect SNPs Miguel Galves José Augusto Quitzau Zanoni Dias Scylla Bioinformatics –Brazil {miguel,jquitzau,zanoni}@scylla.com.br
  • 2. Agenda  Introduction  HIV Dataset  Detection Strategy  Trimming Procedure  Base-Calling Strategies  Filter Algorithm  Consensus Algorithm  Tests Protocol  Results  Discussion
  • 3. Introduction  Polymorphism: set of base pair locus at which different alleles exists in individuals in some population – The second most frequent allele must appear in at least 1% of the individuals  SNP: polymorphism in a single base pair position  SNP discovery is very important to understand complex diseases
  • 4. HIV Dataset  HIV genetic sequences: – 1302 bp – Well-conserved region  35 batches from 35 individuals: – 6 PCR reads, with average size of 690bp – 1 validated sequence, with manually annotated SNPs  HIV Reference Sequence
  • 5. Detection Strategy: Survey  Trimming Procedure  Base-Calling Correction  SNPs Filter  Batch Consensus Algorithm
  • 6. Trimming Procedure  Low Quality Ends filtering  Converts phred’s quality sequence to error probability sequence: ⇒ Q = -10 x log10(p)  Subtract 0.05 from all values (Q=13)  Maximum Score Subsequence Algorithm
  • 7. Base Calling: Area Ratio  The base calling is made in 5 Steps: 1. Chromatogram area delimitation 2. Peak search 3. Choice of the nearest peaks 4. Calculation of the nearest peaks area 5. Calculation of the polymorphic/reference peak area  If the calculated ratio is above a certain threshold, the point is considered a polymorphism.
  • 8. Base Calling: Area Delimitation
  • 9. Base Calling: Peak Identification
  • 10. Base Calling: Average Height Ratio  Almost the same steps: 1. Chromatogram area delimitation 2. Peak search 3. Choice of the nearest peaks 4. Calculation of the nearest peaks average height 5. Calculation of the polymorphic/reference peak average height.  Again, if the calculated ratio is above a certain threshold, the point is considered a polymorphism.
  • 11. Base Calling: Peak Identification
  • 12. Filter Algorithm  Analyzes each sequence  Uses a window based algorithm to eliminate adjacents SNPs – Window size: 11 bases – Empirical score system assigned to polymorphism in the window
  • 13. Consensus Algorithm  Rule-based algorithm – Empirical rules  Analyzes the whole cross section to define a consensus – Take account of nucleotide frequencies and qualities  Do not create N symbols, nor tri-allelic polymorphisms.
  • 14. Consensus Algorithm: Example Sequence 1 A25 C30 C18 C30 A21 Sequence 2 A30 C25 C15 C25 A16 Sequence 3 - M18 A9 C30 - Sequence 4 - - S12 G17 T18 Consensus A M S S W
  • 15. Tests Protocol: Third Party Packages  Two external packages used to compare our results: – Polybayes: SNP detection tool based on Bayesian Methods – Polyphred: SNP detection tool based on chromatogram analysis  ACE file (contig and consensus) created for each batch using phrap  ACE file analyzed by Polyphred and Polybayes  Results viewed with consed
  • 16. Tests Protocol: Our strategy  Reads trimmed using Maximum Subsequence Algorithm  Base-calling analysis and correction using algorithms describe previously  SNP filtering  Multiple alignment – Reference sequence as anchor  Consensus creation
  • 17. Third Party Results: Polybayes  Polybayes detected SNPs in only 2 batches out of 35 Batch Existing SNPs Detected SNPs Correct SNPs False Positives False Negatives Batch 13 12 1 1 0 11 Batch 15 5 1 0 1 5
  • 18. Third Party Results: Polyphred  Polyphred detected SNPs in only 4 batches out of 35 Batch Existing SNPs Detected SNPs Correct SNPs False Positives False Negatives Batch 07 10 1 0 1 10 Batch 14 4 3 0 3 4 Batch 32 26 1 0 1 26 Batch 35 15 8 1 7 14
  • 19. Trimming Results  Reads average size: – Before trimming: 690.15bp – After trimming: 374.74bp – Reduction of 45%  Reference sequence average base coverage – Before trimming: 2.69 – After trimming: 1.77
  • 20. Results: True Positive (%) x batch
  • 21. Results: False Negative (%) x batch
  • 22. Results: False Positive (%) x batch
  • 23. Results: Summary Polybayes Polyphred Area Avg. Height Avg SD Avg SD Avg SD Avg SD TP 0.3 1.4 0.2 1.1 75.4 19.2 52.6 21.5 FN 99.7 1.4 99.8 1.1 23.2 18.4 45.6 21.7 DP 0.0 0.0 0.0 0.0 1.4 4.3 1.8 4.0 FP 2.9 16.9 11.1 31.3 393.9 312.3 554.4 511.3 TP + FN + DP = 100%
  • 24. Discussion  Polybayes and Polyphred need large sets of data to produces good results  Our algorithm produces quite satisfactory results taking into account data characteristics: – Low average coverage – High amount of low quality bases – High amount of polymorphisms (virus DNA)  Area Ratio strategy produces better results than Average Height strategy
  • 25. Future Work  Test the algorithms whith larger batches, whith higher average coverage, to improve consensus algorithm  Reproduce the experiments using genetic sequences of more conserved life forms, such as mammals