SlideShare uma empresa Scribd logo
1 de 93
wvcrieki
Inleiding tot de bio-informatica en
computationele biologie
Lab for Bioinformatics and
computational genomics
10 “genome hackers”
mostly engineers (statistics)
42 scientists
technicians, geneticists, clinicians
>100 people
hardware engineers,
mathematicians, molecular biologists
What is Bioinformatics ?
• Application of information technology to
the storage, management and analysis of
biological information (Facilitated by the
use of computers)
– Sequence analysis?
– Molecular modeling (HTX) ?
– Phylogeny/evolution?
– Ecology and population studies?
– Medical informatics?
– Image Analysis ?
– Statistics ? AI ?
– Sterkstroom of zwakstroom ?
• Medicine (Pharma)
– Genome analysis allows the targeting of genetic
diseases
– The effect of a disease or of a therapeutic on
RNA and protein levels can be elucidated
– Knowledge of protein structure facilitates drug
design
– Understanding of genomic variation allows the
tailoring of medical treatment to the individual’s
genetic make-up
• The same techniques can be applied to crop (Agro)
and livestock improvement (Animal Health)
Promises of genomics and bioinformatics
Math
Informatics
Bioinformatics, a life science discipline … management of expectations
Theoretical Biology
Computational Biology
(Molecular)
Biology
Computer Science
Bioinformatics
Discovery Informatics – Computational Genomics
Interface Design
AI, Image Analysis
structure prediction (HTX)
Sequence Analysis
Expert Annotation
NP
Datamining
Time (years)
• Timelin: Magaret
Dayhoff …
Happy Birthday …
nature
the
Human
genome
Setting the stage …
Biological Research
Adapted from John McPherson, OICR
And this is just the beginning ….
Next Generation Sequencing is
here
One additional insight ...
Read Length is Not As Important For Resequencing
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
8 10 12 14 16 18 20
Length of K-mer Reads (bp)
%ofPairedK-merswithUniquely
AssignableLocation
E.COLI
HUMAN
Jay Shendure
ABI SOLID
Paired End Reads are Important!
Repetitive DNA
Unique DNA
Single read maps to
multiple positions
Paired read maps uniquely
Read 1 Read 2
Known Distance
Single Molecule Sequencing
Helicos Biosciences Corp.
Microscope slide
Single DNA
molecule
dNTP-Cy3
* * *
*
primer
Super-cooled
TIRF microscope
Adapted from: Barak Cohen, Washington University, Bio5488 http://tinyurl.com/6zttuq http://tinyurl.com/6k26nh
Complete genomics
Next next generation sequencing
Third generation sequencing
Now sequencing
Pacific Biosciences: A Third Generation Sequencing Technology
Eid et al 2008
Nanopore Sequencing
Ultra-low-cost SINGLE molecule sequencing
Genome Size
DOGS: Database Of Genome Sizes
E. coli = 4.2 x 106
Yeast = 18 x 106
Arabidopsis = 80 x 106
C.elegans = 100 x 106
Drosophila = 180 x 106
Human/Rat/Mouse = 3000 x 106
Lily = 300 000 x 106
With ... : 99.9 %
To primates: 99%
Anno 2012
Anno 2012
Identity
The extent to which two (nucleotide or amino acid)
sequences are invariant.
Homology
Similarity attributed to descent from a common ancestor.
Definitions
RBP: 26 RVKENFDKARFSGTWYAMAKKDPEGLFLQDNIVAEFSVDETGQMSATAKGRVRLLNNWD- 84
+ K ++ + + GTW++MA+ L + A V T + +L+ W+
glycodelin: 23 QTKQDLELPKLAGTWHSMAMA-TNNISLMATLKAPLRVHITSLLPTPEDNLEIVLHRWEN 81
Orthologous
Homologous sequences in different species
that arose from a common ancestral gene
during speciation; may or may not be responsible
for a similar function.
Paralogous
Homologous sequences within a single species
that arose by gene duplication.
Definitions
speciation
duplication
• Simple identity, which scores only identical amino
acids as a match.
• Genetic code changes, which scores the
minimum number of nucieotide changes to change
a codon for one amino acid into a codon for the
other.
• Chemical similarity of amino acid side chains,
which scores as a match two amino acids which
have a similar side chain, such as hydrophobic,
charged and polar amino acid groups.
• The Dayhoff percent accepted mutation (PAM)
family of matrices, which scores amino acid pairs
on the basis of the expected frequency of
substitution of one amino acid for the other during
protein evolution.
• The blocks substitution matrix (BLOSUM) amino
acid substitution tables, which scores amino acid
pairs based on the frequency of amino acid
substitutions in aligned sequence motifs called
blocks which are found in protein families
Overview
BLOSUM (BLOck – SUM) scoring
DDNAAV
DNAVDD
NNVAVV
Block = ungapped alignent
Eg. Amino Acids D N V A
a b c d e f
1
2
3
S = 3 sequences
W = 6 aa
N= (W*S*(S-1))/2 = 18 pairs
A. Observed pairs
DDNAAV
DNAVDD
NNVAVV
a b c d e f
1
2
3
D N A V
D
N
A
V
1
4
1
3
1
1
1
1
4 1
f fij
D N A V
D
N
A
V
.056
.222
.056
.167
.056
.056
.056
.056
.222 .056
gij
/18
Relative frequency table
Probability of obtaining a pair
if randomly choosing pairs
from block
AB. Expected pairs
DDDDD
NNNN
AAAA
VVVVV
DDNAAV
DNAVDD
NNVAVV
Pi
5/18
4/18
4/18
5/18
P{Draw DN pair}= P{Draw D, then N or Draw M, then D}
P{Draw DN pair}= PDPN + PNPD = 2 * (5/18)*(4/18) = .123
D N A V
D
N
A
V
.077
.123
.154
.123
.049
.123
.099
.049
.123 .049
eijRandom rel. frequency table
Probability of obtaining a pair of
each amino acid drawn
independently from block
C. Summary (A/B)
sij = log2 gij/eij
(sij) is basic BLOSUM score matrix
Notes:
• Observed pairs in blocks contain information about
relationships at all levels of evolutionary distance
simultaneously (Cf: Dayhoffs’s close relationships)
• Actual algorithm generates observed + expected pair
distributions by accumalution over a set of approx. 2000
ungapped blocks of varrying with (w) + depth (s)
• blosum30,35,40,45,50,55,60,62,65,70,75,80,85,90
• transition frequencies observed directly by identifying
blocks that are at least
– 45% identical (BLOSUM45)
– 50% identical (BLOSUM50)
– 62% identical (BLOSUM62) etc.
• No extrapolation made
• High blosum - closely related sequences
• Low blosum - distant sequences
• blosum45  pam250
• blosum62  pam160
• blosum62 is the most popular matrix
The BLOSUM Series
Overview
• Church of the Flying Spaghetti Monster
• http://www.venganza.org/about/open-letter
– Henikoff and Henikoff have compared the
BLOSUM matrices to PAM by evaluating how
effectively the matrices can detect known members
of a protein family from a database when searching
with the ungapped local alignment program
BLAST. They conclude that overall the BLOSUM
62 matrix is the most effective.
• However, all the substitution matrices investigated
perform better than BLOSUM 62 for a proportion of
the families. This suggests that no single matrix is
the complete answer for all sequence comparisons.
• It is probably best to compliment the BLOSUM 62
matrix with comparisons using 250 PAMS, and
Overington structurally derived matrices.
– It seems likely that as more protein three
dimensional structures are determined, substitution
tables derived from structure comparison will give
the most reliable data.
Overview
Rat versus
mouse RBP
Rat versus
bacterial
lipocalin
• Exhaustive …
– All combinations:
• Algorithm
– Dynamic programming (much faster)
• Heuristics
– Needleman – Wunsh for global
alignments
(Journal of Molecular Biology, 1970)
– Later adapated by Smith-Waterman
for local alignment
Alignments
A metric …
GACGGATTAG, GATCGGAATAG
GA-CGGATTAG
GATCGGAATAG
+1 (a match), -1 (a mismatch),-2 (gap)
9*1 + 1*(-1)+1*(-2) = 6
Needleman-Wunsch-edu.pl
The Score Matrix
----------------
Seq1(j)1 2 3 4 5 6 7
Seq2 * C K H V F C R
(i) * 0 -1 -2 -3 -4 -5 -6 -7
1 C -1 1 0 -1 -2 -3 -4 -5
2 K -2 0 2 1 0 -1 -2 -3
3 K -3 -1 1 1 0 -1 -2 -3
4 C -4 -2 0 0 0 -1 0 -1
5 F -5 -3 -1 -1 -1 1 0 -1
6 C -6 -4 -2 -2 -2 0 2 1
7 K -7 -5 -3 -3 -3 -1 1 1
8 C -8 -6 -4 -4 -4 -2 0 0
9 V -9 -7 -5 -5 -3 -3 -1 -1
Needleman-Wunsch-edu.pl
The Score Matrix
----------------
Seq1(j)1 2 3 4 5 6 7
Seq2 * C K H V F C R
(i) * 0 -1 -2 -3 -4 -5 -6 -7
1 C -1 1 0 -1 -2 -3 -4 -5
2 K -2 0 2 1 0 -1 -2 -3
3 K -3 -1 1 1 0 -1 -2 -3
4 C -4 -2 0 0 0 -1 0 -1
5 F -5 -3 -1 -1 -1 1 0 -1
6 C -6 -4 -2 -2 -2 0 2 1
7 K -7 -5 -3 -3 -3 -1 1 1
8 C -8 -6 -4 -4 -4 -2 0 0
9 V -9 -7 -5 -5 -3 -3 -1 -1
Needleman-Wunsch-edu.pl
The Score Matrix
----------------
Seq1(j)1 2 3 4 5 6 7
Seq2 * C K H V F C R
(i) * 0 -1 -2 -3 -4 -5 -6 -7
1 C -1 1 0 -1 -2 -3 -4 -5
2 K -2 0 2 1 0 -1 -2 -3
3 K -3 -1 1 1 0 -1 -2 -3
4 C -4 -2 0 0 0 -1 0 -1
5 F -5 -3 -1 -1 -1 1 0 -1
6 C -6 -4 -2 -2 -2 0 2 1
7 K -7 -5 -3 -3 -3 -1 1 1
8 C -8 -6 -4 -4 -4 -2 0 0
9 V -9 -7 -5 -5 -3 -3 -1 -1
a
bc
A: matrix(i,j) = matrix(i-1,j-1) + (MIS)MATCH
if (substr(seq1,j-1,1) eq substr(seq2,i-1,1)
B: up_score = matrix(i-1,j) + GAP
C: left_score = matrix(i,j-1) + GAP
Needleman-Wunsch-edu.pl
The Score Matrix
----------------
Seq1(j)1 2 3 4 5 6 7
Seq2 * C K H V F C R
(i) * 0 -1 -2 -3 -4 -5 -6 -7
1 C -1 1 0 -1 -2 -3 -4 -5
2 K -2 0 2 1 0 -1 -2 -3
3 K -3 -1 1 1 0 -1 -2 -3
4 C -4 -2 0 0 0 -1 0 -1
5 F -5 -3 -1 -1 -1 1 0 -1
6 C -6 -4 -2 -2 -2 0 2 1
7 K -7 -5 -3 -3 -3 -1 1 1
8 C -8 -6 -4 -4 -4 -2 0 0
9 V -9 -7 -5 -5 -3 -3 -1 -1
Needleman-Wunsch-edu.pl
Needleman-Wunsch-edu.pl
Seq1:CKHVFCRVCI
Seq2:CKKCFC-KCV
++--++--+- score = 0
• Practicum: use similarity function in
initialization step -> scoring tables
• Time Complexity
• Use random proteins to generate
histogram of scores from aligned
random sequences
Time complexity with needleman-wunsch.pl
Sequence Length (aa) Execution Time (s)
10 0
25 0
50 0
100 1
500 5
1000 19
2500 559
5000 Memory could not be
written
Average around -64 !
-80
-78
-76
-74
-72 **
-70 *******
-68 ***************
-66 *************************
-64 ************************************************************
-60 ***********************
-58 ***************
-56 ********
-54 ****
-52 *
-50
-48
-46
-44
-42
-40
-38
If the sequences are similar, the path
of the best alignment should be very
close to the main diagonal.
Therefore, we may not need to fill the
entire matrix, rather, we fill a narrow
band of entries around the main
diagonal.
An algorithm that fills in a band of
width 2k+1 around the main
diagonal.
Multiple Alignment Method
Multiple Alignment Method
Phylogenetic methods may be used to
solve crimes, test purity of products, and
determine whether endangered species
have been smuggled or mislabeled:
– Vogel, G. 1998. HIV strain analysis debuts in
murder trial. Science 282(5390): 851-853.
– Lau, D. T.-W., et al. 2001. Authentication of
medicinal Dendrobium species by the internal
transcribed spacer of ribosomal DNA. Planta
Med 67:456-460.
Examples
– Epidemiologists use phylogenetic methods to
understand the development of pandemics,
patterns of disease transmission, and
development of antimicrobial resistance or
pathogenicity:
• Basler, C.F., et al. 2001. Sequence of the 1918
pandemic influenza virus nonstructural gene (NS)
segment and characterization of recombinant viruses
bearing the 1918 NS genes. PNAS, 98(5):2746-2751.
• Ou, C.-Y., et al. 1992. Molecular epidemiology of HIV
transmission in a dental practice. Science
256(5060):1165-1171.
• Bacillus Antracis:
Examples
Tree Of Life
Modeling
Ramachandran / Phi-Psi Plot
Protein Architecture
• Finding a structural homologue
• Blast
–versus PDB database or PSI-
blast (E<0.005)
–Domain coverage at least 60%
• Avoid Gaps
–Choose for few gaps and
reasonable similarity scores
instead of lots of gaps and high
similarity scores
Modeling
Bootstrapping - an example
Ciliate SSUrDNA - parsimony bootstrap
Majority-rule consensus
Ochromonas (1)
Symbiodinium (2)
Prorocentrum (3)
Euplotes (8)
Tetrahymena (9)
Loxodes (4)
Tracheloraphis (5)
Spirostomum (6)
Gruberia (7)
100
96
84
100
100
100
Overview
Personalized Medicine,
Biomarkers …
… Molecular Profiling
First Generation Molecular Profiling
Next Generation Molecular Profiling
Next Generation Epigenetic Profiling
Concluding Remarks
CONFIDENTIAL
Defining Epigenetics
 Reversible changes in gene
expression/function
 Without changes in DNA
sequence
 Can be inherited from
precursor cells
 Allows to integrate intrinsic
with environmental signals
(including diet)
Methylation I Epigenetics | Oncology | Biomarker
Genome
DNA
Gene Expression
Epigenome
Chromatin
Phenotype
I NEXT-GEN | PharmacoDX | CRC
CONFIDENTIAL
Methylation I Epigenetics | Oncology | Biomarker
I NEXT-GEN | PharmacoDX | CRC
CONFIDENTIAL
Epigenetic Regulation:
Post Translational Modifications to Histones and Base Changes in DNA
 Epigenetic modifications of histones and DNA include:
– Histone acetylation and methylation, and DNA methylation
Histone
Acetylation
Histone
Methylation
DNA Methylation
MeMe
Ac
Me
Methylation I Epigenetics | Oncology | Biomarker
I NEXT-GEN | PharmacoDX | CRC
CONFIDENTIAL
MGMT Biology
O6 Methyl-Guanine
Methyl Transferase
Essential DNA Repair Enzyme
Removes alkyl groups from damaged guanine
bases
Healthy individual:
- MGMT is an essential DNA repair enzyme
Loss of MGMT activity makes individuals susceptible
to DNA damage and prone to tumor development
Glioblastoma patient on alkylator chemotherapy:
- Patients with MGMT promoter methylation show
have longer PFS and OS with the use of alkylating
agents as chemotherapy
Methylation I Epigenetics | Oncology | Biomarker
I NEXT-GEN | PharmacoDX | CRC
CONFIDENTIAL
MGMT Promoter
Methylation Predicts
Benefit form DNA-Alkylating Chemotherapy
Post-hoc subgroup analysis of Temozolomide Clinical trial with primary glioblastoma
patients show benefit for patients with MGMT promoter methylation
0
5
10
15
20
25
Median Overall Survival
21.7 months
12.7 months
radiotherapy
plus
temozolomide
Methylated
MGMT Gene
Non-Methylated
MGMT Gene
radiotherapy
Adapted from Hegi et al.
NEJM 2005
352(10):1036-8.
Study with 207 patients
Methylation I Epigenetics | Oncology | Biomarker
I NEXT-GEN | PharmacoDX | CRC
CONFIDENTIAL
Genome-wide methylation
by methylation sensitive restriction enzymes
Methylation I Epigenetics | Oncology | Biomarker
I NEXT-GEN | PharmacoDX | CRC
CONFIDENTIAL
Genome-wide methylation
by probes
Methylation I Epigenetics | Oncology | Biomarker
I NEXT-GEN | PharmacoDX | CRC
CONFIDENTIAL# samples
# markers
Genome-wide methylation
…. by next generation sequencing
Discovery
Verification
Validation
Methylation I Epigenetics | Oncology | Biomarker
I NEXT-GEN | PharmacoDX | CRC
CONFIDENTIAL
MBD_Seq
DNA Sheared
Immobilized
Methyl Binding Domain
Methylation I Epigenetics | Oncology | Biomarker
Condensed Chromatin
DNA Sheared
I NEXT-GEN | PharmacoDX | CRC
CONFIDENTIAL
Immobilized
Methyl binding domain
MgCl2
Next Gen Sequencing
GA Illumina: 100 million reads
MBD_Seq
Methylation I Epigenetics | Oncology | Biomarker
I NEXT-GEN | PharmacoDX | CRC
CONFIDENTIAL
MBD_Seq
MGMT = dual core
Methylation I Epigenetics | Oncology | Biomarker
I NEXT-GEN | PharmacoDX | CRC
CONFIDENTIAL# samples
# markers
MBD_Seq
Genome-wide methylation
…. by next generation sequencing
Discovery
1-2 million
methylation
cores
Methylation I Epigenetics | Oncology | Biomarker
I NEXT-GEN | PharmacoDX | CRC
CONFIDENTIAL
Data integration
Correlation tracks
87
methylation methylation
expression expression
Corr =-1 Corr = 1
CONFIDENTIAL
Correlation track
in GBM @ MGMT
88
Methylation I Epigenetics | Oncology | Biomarker
I NEXT-GEN | PharmacoDX |
+1
-1
CONFIDENTIAL# samples
Methylation I Epigenetics | Oncology | Biomarker
# markers
MBD_Seq
454_BT_Seq
MSP
Genome-wide methylation
…. by next generation sequencing
Discovery
Verification
Validation
I NEXT-GEN | PharmacoDX |
CONFIDENTIAL
GCATCGTGACTTACGACTGATCGATGGATGCTAGCAT
unmethylated alleles
less methylationmethylated alleles
more methylation
Deep Sequencing
CONFIDENTIAL
Deep MGMT
Heterogenic complexity
Methylation I Epigenetics | Oncology | Biomarker
I NEXT-GEN | PharmacoDX | CRC
CONFIDENTIAL
92
Methylation I Epigenetics | Oncology | Biomarker
I NEXT-GEN | PharmacoDX | CRC
93
biobix
wvcrieki
biobix.be
bioinformatics.be

Mais conteúdo relacionado

Mais procurados

Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searching
Prof. Wim Van Criekinge
 
Blast bioinformatics
Blast bioinformaticsBlast bioinformatics
Blast bioinformatics
atmapandey
 
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)
Asiri Wijesinghe
 

Mais procurados (15)

blast and fasta
 blast and fasta blast and fasta
blast and fasta
 
Presentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informaticePresentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informatice
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searching
 
Tobias marschall haplotype aware genotyping
Tobias marschall haplotype aware genotypingTobias marschall haplotype aware genotyping
Tobias marschall haplotype aware genotyping
 
Lineage-driven Fault Injection, SIGMOD'15
Lineage-driven Fault Injection, SIGMOD'15Lineage-driven Fault Injection, SIGMOD'15
Lineage-driven Fault Injection, SIGMOD'15
 
Karen miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionKaren miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detection
 
Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 
Mayank
MayankMayank
Mayank
 
Blast bioinformatics
Blast bioinformaticsBlast bioinformatics
Blast bioinformatics
 
Topological associated domains- Hi-C
Topological associated domains- Hi-CTopological associated domains- Hi-C
Topological associated domains- Hi-C
 
Introduction to Bayesian phylogenetics and BEAST
Introduction to Bayesian phylogenetics and BEASTIntroduction to Bayesian phylogenetics and BEAST
Introduction to Bayesian phylogenetics and BEAST
 
Ashg2015 schneider final
Ashg2015 schneider finalAshg2015 schneider final
Ashg2015 schneider final
 
Sequence comparison techniques
Sequence comparison techniquesSequence comparison techniques
Sequence comparison techniques
 
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)
 

Destaque

Bioinformatica t9-t10-biocheminformatics
Bioinformatica t9-t10-biocheminformaticsBioinformatica t9-t10-biocheminformatics
Bioinformatica t9-t10-biocheminformatics
Prof. Wim Van Criekinge
 

Destaque (20)

2015 bioinformatics bio_python_part3
2015 bioinformatics bio_python_part32015 bioinformatics bio_python_part3
2015 bioinformatics bio_python_part3
 
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
 
2015 bioinformatics bio_python_partii
2015 bioinformatics bio_python_partii2015 bioinformatics bio_python_partii
2015 bioinformatics bio_python_partii
 
2015 04 22_time_labs_shared
2015 04 22_time_labs_shared2015 04 22_time_labs_shared
2015 04 22_time_labs_shared
 
2012 12 12_adam_v_final
2012 12 12_adam_v_final2012 12 12_adam_v_final
2012 12 12_adam_v_final
 
Thesis2014
Thesis2014Thesis2014
Thesis2014
 
Bioinformatica 06-10-2011-t2-databases
Bioinformatica 06-10-2011-t2-databasesBioinformatica 06-10-2011-t2-databases
Bioinformatica 06-10-2011-t2-databases
 
Mini symposium
Mini symposiumMini symposium
Mini symposium
 
Bioinformatica t9-t10-biocheminformatics
Bioinformatica t9-t10-biocheminformaticsBioinformatica t9-t10-biocheminformatics
Bioinformatica t9-t10-biocheminformatics
 
Bioinformatica 29-09-2011-p1-introduction
Bioinformatica 29-09-2011-p1-introductionBioinformatica 29-09-2011-p1-introduction
Bioinformatica 29-09-2011-p1-introduction
 
2015 bioinformatics python_io_wim_vancriekinge
2015 bioinformatics python_io_wim_vancriekinge2015 bioinformatics python_io_wim_vancriekinge
2015 bioinformatics python_io_wim_vancriekinge
 
Bioinformatics p5-bioperlv2014
Bioinformatics p5-bioperlv2014Bioinformatics p5-bioperlv2014
Bioinformatics p5-bioperlv2014
 
2015 09 imec_wim_vancriekinge_v42_to_present
2015 09 imec_wim_vancriekinge_v42_to_present2015 09 imec_wim_vancriekinge_v42_to_present
2015 09 imec_wim_vancriekinge_v42_to_present
 
Bioinformatica t8-go-hmm
Bioinformatica t8-go-hmmBioinformatica t8-go-hmm
Bioinformatica t8-go-hmm
 
Bioinformatics v2014 wim_vancriekinge
Bioinformatics v2014 wim_vancriekingeBioinformatics v2014 wim_vancriekinge
Bioinformatics v2014 wim_vancriekinge
 
2012 12 02_epigenetic_profiling_environmental_health_sciences
2012 12 02_epigenetic_profiling_environmental_health_sciences2012 12 02_epigenetic_profiling_environmental_health_sciences
2012 12 02_epigenetic_profiling_environmental_health_sciences
 
Van criekinge next_generation_epigenetic_profling_vlille
Van criekinge next_generation_epigenetic_profling_vlilleVan criekinge next_generation_epigenetic_profling_vlille
Van criekinge next_generation_epigenetic_profling_vlille
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014
 
Bioinformatica 27-10-2011-t4-alignments
Bioinformatica 27-10-2011-t4-alignmentsBioinformatica 27-10-2011-t4-alignments
Bioinformatica 27-10-2011-t4-alignments
 
Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014
 

Semelhante a Bioinformatics life sciences_v2015

20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
Computer Science Club
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
butest
 
MMseqs NGS 2014
MMseqs NGS 2014MMseqs NGS 2014
MMseqs NGS 2014
Martin Steinegger
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomics
USC
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Natalio Krasnogor
 

Semelhante a Bioinformatics life sciences_v2015 (20)

20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
Sequence Analysis.ppt
Sequence Analysis.pptSequence Analysis.ppt
Sequence Analysis.ppt
 
MMseqs NGS 2014
MMseqs NGS 2014MMseqs NGS 2014
MMseqs NGS 2014
 
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
 
Metaheuristic Tuning of Type-II Fuzzy Inference System for Data Mining
Metaheuristic Tuning of Type-II Fuzzy Inference System for Data MiningMetaheuristic Tuning of Type-II Fuzzy Inference System for Data Mining
Metaheuristic Tuning of Type-II Fuzzy Inference System for Data Mining
 
bioinformatic.pptx
bioinformatic.pptxbioinformatic.pptx
bioinformatic.pptx
 
UMBC Research Day Presentation
UMBC Research Day PresentationUMBC Research Day Presentation
UMBC Research Day Presentation
 
Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programming
 
5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
 
_BLAST.ppt
_BLAST.ppt_BLAST.ppt
_BLAST.ppt
 
BITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS: Basics of Sequence similarity
BITS: Basics of Sequence similarity
 
Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014
 
Similarity
SimilaritySimilarity
Similarity
 
BLAST and sequence alignment
BLAST and sequence alignmentBLAST and sequence alignment
BLAST and sequence alignment
 
Ch06 alignment
Ch06 alignmentCh06 alignment
Ch06 alignment
 
Theory to consider an inaccurate testing and how to determine the prior proba...
Theory to consider an inaccurate testing and how to determine the prior proba...Theory to consider an inaccurate testing and how to determine the prior proba...
Theory to consider an inaccurate testing and how to determine the prior proba...
 

Mais de Prof. Wim Van Criekinge

Mais de Prof. Wim Van Criekinge (20)

2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload
 
P7 2018 biopython3
P7 2018 biopython3P7 2018 biopython3
P7 2018 biopython3
 
P6 2018 biopython2b
P6 2018 biopython2bP6 2018 biopython2b
P6 2018 biopython2b
 
P4 2018 io_functions
P4 2018 io_functionsP4 2018 io_functions
P4 2018 io_functions
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
T1 2018 bioinformatics
T1 2018 bioinformaticsT1 2018 bioinformatics
T1 2018 bioinformatics
 
P1 2018 python
P1 2018 pythonP1 2018 python
P1 2018 python
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]
 
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload
 
2018 03 20_biological_databases_part3
2018 03 20_biological_databases_part32018 03 20_biological_databases_part3
2018 03 20_biological_databases_part3
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
P7 2017 biopython3
P7 2017 biopython3P7 2017 biopython3
P7 2017 biopython3
 
P6 2017 biopython2
P6 2017 biopython2P6 2017 biopython2
P6 2017 biopython2
 

Último

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 

Último (20)

Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 

Bioinformatics life sciences_v2015

  • 2.
  • 3. Inleiding tot de bio-informatica en computationele biologie
  • 4. Lab for Bioinformatics and computational genomics 10 “genome hackers” mostly engineers (statistics) 42 scientists technicians, geneticists, clinicians >100 people hardware engineers, mathematicians, molecular biologists
  • 5. What is Bioinformatics ? • Application of information technology to the storage, management and analysis of biological information (Facilitated by the use of computers) – Sequence analysis? – Molecular modeling (HTX) ? – Phylogeny/evolution? – Ecology and population studies? – Medical informatics? – Image Analysis ? – Statistics ? AI ? – Sterkstroom of zwakstroom ?
  • 6. • Medicine (Pharma) – Genome analysis allows the targeting of genetic diseases – The effect of a disease or of a therapeutic on RNA and protein levels can be elucidated – Knowledge of protein structure facilitates drug design – Understanding of genomic variation allows the tailoring of medical treatment to the individual’s genetic make-up • The same techniques can be applied to crop (Agro) and livestock improvement (Animal Health) Promises of genomics and bioinformatics
  • 7. Math Informatics Bioinformatics, a life science discipline … management of expectations Theoretical Biology Computational Biology (Molecular) Biology Computer Science Bioinformatics Discovery Informatics – Computational Genomics Interface Design AI, Image Analysis structure prediction (HTX) Sequence Analysis Expert Annotation NP Datamining
  • 9.
  • 13. Biological Research Adapted from John McPherson, OICR
  • 14. And this is just the beginning …. Next Generation Sequencing is here
  • 15.
  • 16.
  • 18. Read Length is Not As Important For Resequencing 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 8 10 12 14 16 18 20 Length of K-mer Reads (bp) %ofPairedK-merswithUniquely AssignableLocation E.COLI HUMAN Jay Shendure
  • 19.
  • 21. Paired End Reads are Important! Repetitive DNA Unique DNA Single read maps to multiple positions Paired read maps uniquely Read 1 Read 2 Known Distance
  • 22. Single Molecule Sequencing Helicos Biosciences Corp. Microscope slide Single DNA molecule dNTP-Cy3 * * * * primer Super-cooled TIRF microscope Adapted from: Barak Cohen, Washington University, Bio5488 http://tinyurl.com/6zttuq http://tinyurl.com/6k26nh
  • 24. Next next generation sequencing Third generation sequencing Now sequencing
  • 25. Pacific Biosciences: A Third Generation Sequencing Technology Eid et al 2008
  • 28. Genome Size DOGS: Database Of Genome Sizes E. coli = 4.2 x 106 Yeast = 18 x 106 Arabidopsis = 80 x 106 C.elegans = 100 x 106 Drosophila = 180 x 106 Human/Rat/Mouse = 3000 x 106 Lily = 300 000 x 106 With ... : 99.9 % To primates: 99%
  • 29.
  • 32. Identity The extent to which two (nucleotide or amino acid) sequences are invariant. Homology Similarity attributed to descent from a common ancestor. Definitions RBP: 26 RVKENFDKARFSGTWYAMAKKDPEGLFLQDNIVAEFSVDETGQMSATAKGRVRLLNNWD- 84 + K ++ + + GTW++MA+ L + A V T + +L+ W+ glycodelin: 23 QTKQDLELPKLAGTWHSMAMA-TNNISLMATLKAPLRVHITSLLPTPEDNLEIVLHRWEN 81
  • 33. Orthologous Homologous sequences in different species that arose from a common ancestral gene during speciation; may or may not be responsible for a similar function. Paralogous Homologous sequences within a single species that arose by gene duplication. Definitions
  • 35. • Simple identity, which scores only identical amino acids as a match. • Genetic code changes, which scores the minimum number of nucieotide changes to change a codon for one amino acid into a codon for the other. • Chemical similarity of amino acid side chains, which scores as a match two amino acids which have a similar side chain, such as hydrophobic, charged and polar amino acid groups. • The Dayhoff percent accepted mutation (PAM) family of matrices, which scores amino acid pairs on the basis of the expected frequency of substitution of one amino acid for the other during protein evolution. • The blocks substitution matrix (BLOSUM) amino acid substitution tables, which scores amino acid pairs based on the frequency of amino acid substitutions in aligned sequence motifs called blocks which are found in protein families Overview
  • 36. BLOSUM (BLOck – SUM) scoring DDNAAV DNAVDD NNVAVV Block = ungapped alignent Eg. Amino Acids D N V A a b c d e f 1 2 3 S = 3 sequences W = 6 aa N= (W*S*(S-1))/2 = 18 pairs
  • 37. A. Observed pairs DDNAAV DNAVDD NNVAVV a b c d e f 1 2 3 D N A V D N A V 1 4 1 3 1 1 1 1 4 1 f fij D N A V D N A V .056 .222 .056 .167 .056 .056 .056 .056 .222 .056 gij /18 Relative frequency table Probability of obtaining a pair if randomly choosing pairs from block
  • 38. AB. Expected pairs DDDDD NNNN AAAA VVVVV DDNAAV DNAVDD NNVAVV Pi 5/18 4/18 4/18 5/18 P{Draw DN pair}= P{Draw D, then N or Draw M, then D} P{Draw DN pair}= PDPN + PNPD = 2 * (5/18)*(4/18) = .123 D N A V D N A V .077 .123 .154 .123 .049 .123 .099 .049 .123 .049 eijRandom rel. frequency table Probability of obtaining a pair of each amino acid drawn independently from block
  • 39. C. Summary (A/B) sij = log2 gij/eij (sij) is basic BLOSUM score matrix Notes: • Observed pairs in blocks contain information about relationships at all levels of evolutionary distance simultaneously (Cf: Dayhoffs’s close relationships) • Actual algorithm generates observed + expected pair distributions by accumalution over a set of approx. 2000 ungapped blocks of varrying with (w) + depth (s)
  • 40. • blosum30,35,40,45,50,55,60,62,65,70,75,80,85,90 • transition frequencies observed directly by identifying blocks that are at least – 45% identical (BLOSUM45) – 50% identical (BLOSUM50) – 62% identical (BLOSUM62) etc. • No extrapolation made • High blosum - closely related sequences • Low blosum - distant sequences • blosum45  pam250 • blosum62  pam160 • blosum62 is the most popular matrix The BLOSUM Series
  • 42. • Church of the Flying Spaghetti Monster • http://www.venganza.org/about/open-letter
  • 43. – Henikoff and Henikoff have compared the BLOSUM matrices to PAM by evaluating how effectively the matrices can detect known members of a protein family from a database when searching with the ungapped local alignment program BLAST. They conclude that overall the BLOSUM 62 matrix is the most effective. • However, all the substitution matrices investigated perform better than BLOSUM 62 for a proportion of the families. This suggests that no single matrix is the complete answer for all sequence comparisons. • It is probably best to compliment the BLOSUM 62 matrix with comparisons using 250 PAMS, and Overington structurally derived matrices. – It seems likely that as more protein three dimensional structures are determined, substitution tables derived from structure comparison will give the most reliable data. Overview
  • 44. Rat versus mouse RBP Rat versus bacterial lipocalin
  • 45. • Exhaustive … – All combinations: • Algorithm – Dynamic programming (much faster) • Heuristics – Needleman – Wunsh for global alignments (Journal of Molecular Biology, 1970) – Later adapated by Smith-Waterman for local alignment Alignments
  • 46. A metric … GACGGATTAG, GATCGGAATAG GA-CGGATTAG GATCGGAATAG +1 (a match), -1 (a mismatch),-2 (gap) 9*1 + 1*(-1)+1*(-2) = 6
  • 47. Needleman-Wunsch-edu.pl The Score Matrix ---------------- Seq1(j)1 2 3 4 5 6 7 Seq2 * C K H V F C R (i) * 0 -1 -2 -3 -4 -5 -6 -7 1 C -1 1 0 -1 -2 -3 -4 -5 2 K -2 0 2 1 0 -1 -2 -3 3 K -3 -1 1 1 0 -1 -2 -3 4 C -4 -2 0 0 0 -1 0 -1 5 F -5 -3 -1 -1 -1 1 0 -1 6 C -6 -4 -2 -2 -2 0 2 1 7 K -7 -5 -3 -3 -3 -1 1 1 8 C -8 -6 -4 -4 -4 -2 0 0 9 V -9 -7 -5 -5 -3 -3 -1 -1
  • 48. Needleman-Wunsch-edu.pl The Score Matrix ---------------- Seq1(j)1 2 3 4 5 6 7 Seq2 * C K H V F C R (i) * 0 -1 -2 -3 -4 -5 -6 -7 1 C -1 1 0 -1 -2 -3 -4 -5 2 K -2 0 2 1 0 -1 -2 -3 3 K -3 -1 1 1 0 -1 -2 -3 4 C -4 -2 0 0 0 -1 0 -1 5 F -5 -3 -1 -1 -1 1 0 -1 6 C -6 -4 -2 -2 -2 0 2 1 7 K -7 -5 -3 -3 -3 -1 1 1 8 C -8 -6 -4 -4 -4 -2 0 0 9 V -9 -7 -5 -5 -3 -3 -1 -1
  • 49. Needleman-Wunsch-edu.pl The Score Matrix ---------------- Seq1(j)1 2 3 4 5 6 7 Seq2 * C K H V F C R (i) * 0 -1 -2 -3 -4 -5 -6 -7 1 C -1 1 0 -1 -2 -3 -4 -5 2 K -2 0 2 1 0 -1 -2 -3 3 K -3 -1 1 1 0 -1 -2 -3 4 C -4 -2 0 0 0 -1 0 -1 5 F -5 -3 -1 -1 -1 1 0 -1 6 C -6 -4 -2 -2 -2 0 2 1 7 K -7 -5 -3 -3 -3 -1 1 1 8 C -8 -6 -4 -4 -4 -2 0 0 9 V -9 -7 -5 -5 -3 -3 -1 -1 a bc A: matrix(i,j) = matrix(i-1,j-1) + (MIS)MATCH if (substr(seq1,j-1,1) eq substr(seq2,i-1,1) B: up_score = matrix(i-1,j) + GAP C: left_score = matrix(i,j-1) + GAP
  • 50. Needleman-Wunsch-edu.pl The Score Matrix ---------------- Seq1(j)1 2 3 4 5 6 7 Seq2 * C K H V F C R (i) * 0 -1 -2 -3 -4 -5 -6 -7 1 C -1 1 0 -1 -2 -3 -4 -5 2 K -2 0 2 1 0 -1 -2 -3 3 K -3 -1 1 1 0 -1 -2 -3 4 C -4 -2 0 0 0 -1 0 -1 5 F -5 -3 -1 -1 -1 1 0 -1 6 C -6 -4 -2 -2 -2 0 2 1 7 K -7 -5 -3 -3 -3 -1 1 1 8 C -8 -6 -4 -4 -4 -2 0 0 9 V -9 -7 -5 -5 -3 -3 -1 -1
  • 53. • Practicum: use similarity function in initialization step -> scoring tables • Time Complexity • Use random proteins to generate histogram of scores from aligned random sequences
  • 54. Time complexity with needleman-wunsch.pl Sequence Length (aa) Execution Time (s) 10 0 25 0 50 0 100 1 500 5 1000 19 2500 559 5000 Memory could not be written
  • 55. Average around -64 ! -80 -78 -76 -74 -72 ** -70 ******* -68 *************** -66 ************************* -64 ************************************************************ -60 *********************** -58 *************** -56 ******** -54 **** -52 * -50 -48 -46 -44 -42 -40 -38
  • 56. If the sequences are similar, the path of the best alignment should be very close to the main diagonal. Therefore, we may not need to fill the entire matrix, rather, we fill a narrow band of entries around the main diagonal. An algorithm that fills in a band of width 2k+1 around the main diagonal.
  • 59. Phylogenetic methods may be used to solve crimes, test purity of products, and determine whether endangered species have been smuggled or mislabeled: – Vogel, G. 1998. HIV strain analysis debuts in murder trial. Science 282(5390): 851-853. – Lau, D. T.-W., et al. 2001. Authentication of medicinal Dendrobium species by the internal transcribed spacer of ribosomal DNA. Planta Med 67:456-460. Examples
  • 60.
  • 61. – Epidemiologists use phylogenetic methods to understand the development of pandemics, patterns of disease transmission, and development of antimicrobial resistance or pathogenicity: • Basler, C.F., et al. 2001. Sequence of the 1918 pandemic influenza virus nonstructural gene (NS) segment and characterization of recombinant viruses bearing the 1918 NS genes. PNAS, 98(5):2746-2751. • Ou, C.-Y., et al. 1992. Molecular epidemiology of HIV transmission in a dental practice. Science 256(5060):1165-1171. • Bacillus Antracis: Examples
  • 63.
  • 64.
  • 65.
  • 69. • Finding a structural homologue • Blast –versus PDB database or PSI- blast (E<0.005) –Domain coverage at least 60% • Avoid Gaps –Choose for few gaps and reasonable similarity scores instead of lots of gaps and high similarity scores Modeling
  • 70. Bootstrapping - an example Ciliate SSUrDNA - parsimony bootstrap Majority-rule consensus Ochromonas (1) Symbiodinium (2) Prorocentrum (3) Euplotes (8) Tetrahymena (9) Loxodes (4) Tracheloraphis (5) Spirostomum (6) Gruberia (7) 100 96 84 100 100 100
  • 71.
  • 72.
  • 73. Overview Personalized Medicine, Biomarkers … … Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  • 74. CONFIDENTIAL Defining Epigenetics  Reversible changes in gene expression/function  Without changes in DNA sequence  Can be inherited from precursor cells  Allows to integrate intrinsic with environmental signals (including diet) Methylation I Epigenetics | Oncology | Biomarker Genome DNA Gene Expression Epigenome Chromatin Phenotype I NEXT-GEN | PharmacoDX | CRC
  • 75. CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 76. CONFIDENTIAL Epigenetic Regulation: Post Translational Modifications to Histones and Base Changes in DNA  Epigenetic modifications of histones and DNA include: – Histone acetylation and methylation, and DNA methylation Histone Acetylation Histone Methylation DNA Methylation MeMe Ac Me Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 77.
  • 78. CONFIDENTIAL MGMT Biology O6 Methyl-Guanine Methyl Transferase Essential DNA Repair Enzyme Removes alkyl groups from damaged guanine bases Healthy individual: - MGMT is an essential DNA repair enzyme Loss of MGMT activity makes individuals susceptible to DNA damage and prone to tumor development Glioblastoma patient on alkylator chemotherapy: - Patients with MGMT promoter methylation show have longer PFS and OS with the use of alkylating agents as chemotherapy Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 79. CONFIDENTIAL MGMT Promoter Methylation Predicts Benefit form DNA-Alkylating Chemotherapy Post-hoc subgroup analysis of Temozolomide Clinical trial with primary glioblastoma patients show benefit for patients with MGMT promoter methylation 0 5 10 15 20 25 Median Overall Survival 21.7 months 12.7 months radiotherapy plus temozolomide Methylated MGMT Gene Non-Methylated MGMT Gene radiotherapy Adapted from Hegi et al. NEJM 2005 352(10):1036-8. Study with 207 patients Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 80. CONFIDENTIAL Genome-wide methylation by methylation sensitive restriction enzymes Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 81. CONFIDENTIAL Genome-wide methylation by probes Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 82. CONFIDENTIAL# samples # markers Genome-wide methylation …. by next generation sequencing Discovery Verification Validation Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 83. CONFIDENTIAL MBD_Seq DNA Sheared Immobilized Methyl Binding Domain Methylation I Epigenetics | Oncology | Biomarker Condensed Chromatin DNA Sheared I NEXT-GEN | PharmacoDX | CRC
  • 84. CONFIDENTIAL Immobilized Methyl binding domain MgCl2 Next Gen Sequencing GA Illumina: 100 million reads MBD_Seq Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 85. CONFIDENTIAL MBD_Seq MGMT = dual core Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 86. CONFIDENTIAL# samples # markers MBD_Seq Genome-wide methylation …. by next generation sequencing Discovery 1-2 million methylation cores Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 87. CONFIDENTIAL Data integration Correlation tracks 87 methylation methylation expression expression Corr =-1 Corr = 1
  • 88. CONFIDENTIAL Correlation track in GBM @ MGMT 88 Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | +1 -1
  • 89. CONFIDENTIAL# samples Methylation I Epigenetics | Oncology | Biomarker # markers MBD_Seq 454_BT_Seq MSP Genome-wide methylation …. by next generation sequencing Discovery Verification Validation I NEXT-GEN | PharmacoDX |
  • 91. CONFIDENTIAL Deep MGMT Heterogenic complexity Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 92. CONFIDENTIAL 92 Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC