SlideShare uma empresa Scribd logo
1 de 123
 
FBW 20-10-2011 Wim Van Criekinge
Inhoud Lessen: Bioinformatica ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],NCBI  - The National Center for Biotechnology Information http://www.ncbi.nlm.nih.gov/ The National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM), a part of the National Institutes of Health (NIH). ExPASy  -  Molecular Biology Server http://expasy.hcuge.ch/www/ Molecular biology WWW server of the Swiss Institute of Bioinformatics (SIB). This server is dedicated to the analysis of protein sequences and structures as well as 2-D PAGE EBI   - European Bioinformatics Institute http://www.ebi.ac.uk/
Anno 2002 Anno 2003
Anno 2004
Anno 2005
Anno 2006
Anno 2007
Anno 2009
Anno 2010 Anno 2010
Anno 2011
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Identity The extent to which two (nucleotide or amino acid)  sequences are invariant. Homology Similarity attributed to descent from a common ancestor. Definitions RBP:  26  RV K ENFDKARFS GTW YA MA KKDPEGLFLQDNIV A EFS V DE T GQMSATAKGRVRL L NN W D- 84 +  K  ++ +  +  GTW ++ MA +  L  +  A   V   T  +  + L +  W +  glycodelin:  23  QT K QDLELPKLA GTW HS MA MA-TNNISLMATLK A PLR V HI T SLLPTPEDNLEIV L HR W EN 81
Orthologous   Homologous sequences in different species  that arose from a common ancestral gene  during speciation; may or may not be responsible  for a similar function.   Paralogous   Homologous sequences within a single species  that arose by gene duplication.  Definitions
speciation duplication
fly  GAKKVIISAP SAD.APM..F VCGVNLDAYK PDMKVVSNAS CTTNCLAPLA  human  GAKRVIISAP SAD.APM..F VMGVNHEKYD NSLKIISNAS CTTNCLAPLA  plant  GAKKVIISAP SAD.APM..F VVGVNEHTYQ PNMDIVSNAS CTTNCLAPLA  bacterium GAKKVVMTGP SKDNTPM..F VKGANFDKY. AGQDIVSNAS CTTNCLAPLA  yeast  GAKKVVITAP SS.TAPM..F VMGVNEEKYT SDLKIVSNAS CTTNCLAPLA  archaeon  GADKVLISAP PKGDEPVKQL VYGVNHDEYD GE.DVVSNAS CTTNSITPVA  fly  KVINDNFEIV EGLMTTVHAT TATQKTVDGP SGKLWRDGRG AAQNIIPAST  human  KVIHDNFGIV EGLMTTVHAI TATQKTVDGP SGKLWRDGRG ALQNIIPAST  plant  KVVHEEFGIL EGLMTTVHAT TATQKTVDGP SMKDWRGGRG ASQNIIPSST  bacterium KVINDNFGII EGLMTTVHAT TATQKTVDGP SHKDWRGGRG ASQNIIPSST  yeast  KVINDAFGIE EGLMTTVHSL TATQKTVDGP SHKDWRGGRT ASGNIIPSST  archaeon  KVLDEEFGIN AGQLTTVHAY TGSQNLMDGP NGKP.RRRRA AAENIIPTST  fly  GAAKAVGKVI PALNGKLTGM AFRVPTPNVS VVDLTVRLGK GASYDEIKAK  human  GAAKAVGKVI PELNGKLTGM AFRVPTANVS VVDLTCRLEK PAKYDDIKKV  plant  GAAKAVGKVL PELNGKLTGM AFRVPTSNVS VVDLTCRLEK GASYEDVKAA  bacterium GAAKAVGKVL PELNGKLTGM AFRVPTPNVS VVDLTVRLEK AATYEQIKAA  yeast  GAAKAVGKVL PELQGKLTGM AFRVPTVDVS VVDLTVKLNK ETTYDEIKKV  archaeon  GAAQAATEVL PELEGKLDGM AIRVPVPNGS ITEFVVDLDD DVTESDVNAA  Multiple sequence alignment of glyceraldehyde- 3-phsophate dehydrogenases
[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],It is very important to realize, that all subsequent results depend critically on just how this is done and what model lies at the basis for the construction of a specific scoring matrix. A scoring matrix is a tool to quantify how well a certain model is represented in the alignment of two sequences, and any result obtained by its application is meaningful exclusively in the context of that model.
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],G and C purine-pyrimidine A and T purine -pyrimidine
[object Object],[object Object],A  T  C  G A  0  5  5  1 T  5  0  1  5 C  5  1  0  5 G  1  5  5  0 ,[object Object]
[object Object],[object Object],[object Object],A  T  C  G A  0  5  5  1 T  5  0  1  5 C  5  1  0  5 G  1  5  5  0
The Genome Chose Its Alphabet With Care  ,[object Object],[object Object]
[object Object],The Genome Chose Its Alphabet With Care
[object Object],[object Object],[object Object],[object Object],The Genome Chose Its Alphabet With Care
[object Object],[object Object],[object Object],[object Object],The Genome Chose Its Alphabet With Care
[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
A  S  G  L  K  V  T  P  E  D  N  I  Q  R  F  Y  C  H  M  W  Z  B  X Ala  = A  O  1  1  2  2  1  1  1  1  1  2  2  2  2  2  2  2  2  2  2  2  2  2 Ser  = S  1  O  1  1  2  2  1  1  2  2  1  1  2  1  1  1  1  2  2  1  2  2  2 Gly  = G  1  1  0  2  2  1  2  2  1  1  2  2  2  1  2  2  1  2  2  1  2  2  2 Leu  = L  2  1  2  0  2  1  2  1  2  2  2  1  1  1  1  2  2  1  1  1  2  2  2 Lys  = K  2  2  2  2  0  2  1  2  1  2  1  1  1  1  2  2  2  2  1  2  1  2  2 Val  = V  1  2  1  1  2  0  2  2  1  1  2  1  2  2  1  2  2  2  1  2  2  2  2 Thr  = T  1  1  2  2  1  2  0  1  2  2  1  1  2  1  2  2  2  2  1  2  2  2  2 Pro  = P  1  1  2  1  2  2  1  0  2  2  2  2  1  1  2  2  2  1  2  2  2  2  2 Glu  - E  1  2  1  2  1  1  2  2  0  1  2  2  1  2  2  2  2  2  2  2  1  2  2 Asp  = D  1  2  1  2  2  1  2  2  1  O  1  2  2  2  2  1  2  1  2  2  2  1  2 Asn  = N  2  1  2  2  1  2  1  2  2  1  O  1  2  2  2  1  2  1  2  2  2  1  2 Ile  = I  2  1  2  1  1  1  1  2  2  2  1  0  2  1  1  2  2  2  1  2  2  2  2 Gln  = Q  2  2  2  1  1  2  2  1  1  2  2  2  0  1  2  2  2  1  2  2  1  2  2 Arg  = R  2  1  1  1  1  2  1  1  2  2  2  1  1  0  2  2  1  1  1  1  2  2  2 Phe  = F  2  1  2  1  2  1  2  2  2  2  2  1  2  2  0  1  1  2  2  2  2  2  2 Tyr  = Y  2  1  2  2  2  2  2  2  2  1  1  2  2  2  1  O  1  1  3  2  2  1  2 Cys  = C  2  1  1  2  2  2  2  2  2  2  2  2  2  1  1  1  0  2  2  1  2  2  2 His  = H  2  2  2  1  2  2  2  1  2  1  1  2  1  1  2  1  2  0  2  2  2  1  2 Met  = M  2  2  2  1  1  1  1  2  2  2  2  1  2  1  2  3  2  2  0  2  2  2  2 Trp  = W  2  1  1  1  2  2  2  2  2  2  2  2  2  1  2  2  1  2  2  0  2  2  2 Glx  = Z  2  2  2  2  1  2  2  2  1  2  2  2  1  2  2  2  2  2  2  2  1  2  2 Asx  = B  2  2  2  2  2  2  2  2  2  1  1  2  2  2  2  1  2  1  2  2  2  1  2 ???  = X  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2 The table is generated by calculating  the minimum number of base changes required to convert an amino acid in row i to an amino acid in column j.  Note Met->Tyr is the only change that requires all 3 codon positions to change. ,[object Object]
[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
All amino acids have the same general formula   ,[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Other similarity scoring matrices might be constructed from  any property of amino acids that can be quantified  - partition coefficients between hydrophobic and hydrophilic phases - charge - molecular volume Unfortunately, …
AAindex ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Protein Eng. 1996 Jan;9(1):27-36.
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
First step: finding “accepted mutations” ,[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dayhoff’s PAM1 mutation probability matrix  (Transition Matrix)
PAM1:  Transition Matrix ,[object Object]
[object Object],[object Object],[object Object],[object Object],PAM1:  Transition Matrix
Second   step: Frequencies of Occurence ,[object Object],[object Object],[object Object]
Amino acid frequencies ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Second   step: Frequencies of Occurence
Third step: Relative Mutabilities ,[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Fourth step: Mutation Probability Matrix ,[object Object],M ij = The mutation probability matrix gives the probability, that an amino acid i will replace an amino acid of type j in a given evolutionary interval, in two related sequences ,[object Object],ADB ADA A  D  B A  D B i j
Fifth step: The Evolutionary Distance ,[object Object],[object Object]
6. Relatedness Odds ,[object Object],[object Object],[object Object],[object Object],[object Object]
Last step: the log-odds matrix ,[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dayhoff’s PAM1 mutation probability matrix  (Transition Matrix)
Weighted Random Selection ,[object Object]
PAM-Simulator
PAM-Simulator
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
 
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
4 3 2 1 0 A brief history of time (BYA) Origin of life Origin of eukaryotes insects Fungi/animal Plant/animal Earliest fossils BYA
Margaret Dayhoff’s 34 protein superfamilies Protein PAMs per 100 million years Ig kappa chain 37 Kappa casein 33 Lactalbumin 27 Hemoglobin   12 Myoglobin 8.9 Insulin 4.4 Histone H4 0.10 Ubiquitin 0.00
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],BLOSUM: Blocks Substitution Matrix
BLOSUM ( BLO ck –  SUM ) scoring DDNAAV DNAVDD NNVAVV Block = ungapped alignent Eg. Amino Acids D N V A a  b  c  d  e  f 1 2 3 S = 3 sequences W = 6 aa N= (W*S*(S-1))/2 = 18 pairs
A. Observed pairs DDNAAV DNAVDD NNVAVV a  b  c  d  e  f 1 2 3 D  N  A  V  D  N A V  1  4 1 3  1 1 1  1 4  1  f f ij D  N  A  V  D  N A V  .056  .222 .056 .167 .056 .056 .056 .056 .222 .056  g ij /18 Relative frequency table Probability of obtaining a pair if randomly choosing pairs from block
B. Expected pairs A DDDDD NNNN AAAA VVVVV DDNAAV DNAVDD NNVAVV P i 5/18 4/18 4/18 5/18 P{Draw DN pair}= P{Draw D, then N or Draw M, then D} P{Draw DN pair}= P D P N  + P N P D  = 2 * (5/18)*(4/18) = .123 D  N  A  V  D  N A V  .077  .123 .154 .123 .049 .123 .099 .049 .123 .049  e ij Random rel. frequency table Probability of obtaining a pair of each amino acid drawn independently from block
C. Summary (A/B) ,[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
Rat versus  mouse RBP Rat versus  bacterial lipocalin
[object Object],[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dotplots ,[object Object],[object Object],[object Object],[object Object]
Dot Plot References ,[object Object],[object Object],[object Object],[object Object]
Visual Alignments (Dot Plots) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dotplot-simulator.pl ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],Window size = 1, stringency 100%
Noise in Dot Plots ,[object Object],[object Object],[object Object],[object Object],[object Object]
Reduction of Dot Plot Noise Self alignment of ACCTGAGCTCACCTGAGTTA
Dotplot-simulator.pl ,[object Object],[object Object],[object Object],[object Object],[object Object]
Chromosome Y self comparison
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Available Dot Plot Programs ,[object Object]
Available Dot Plot Programs ,[object Object]
Available Dot Plot Programs ,[object Object]
Weblems ,[object Object],[object Object],[object Object],[object Object]

Mais conteúdo relacionado

Mais procurados

Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
avrilcoghlan
 

Mais procurados (20)

Pathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformaticsPathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformatics
 
Msa
MsaMsa
Msa
 
BLAST
BLASTBLAST
BLAST
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Parsimony methods
Parsimony methodsParsimony methods
Parsimony methods
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
 
Sequence alignment belgaum
Sequence alignment belgaumSequence alignment belgaum
Sequence alignment belgaum
 
Swiss pdb viewer
Swiss pdb viewerSwiss pdb viewer
Swiss pdb viewer
 
Data retrieval
Data retrievalData retrieval
Data retrieval
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Sequence file formats
Sequence file formatsSequence file formats
Sequence file formats
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
Phylogenetic tree and its construction and phylogeny of
Phylogenetic tree and its construction and phylogeny ofPhylogenetic tree and its construction and phylogeny of
Phylogenetic tree and its construction and phylogeny of
 
Clustal
ClustalClustal
Clustal
 
Phylogenetic Tree, types and Applicantion
Phylogenetic Tree, types and Applicantion Phylogenetic Tree, types and Applicantion
Phylogenetic Tree, types and Applicantion
 
dot plot analysis
dot plot analysisdot plot analysis
dot plot analysis
 
Fasta
FastaFasta
Fasta
 
PIR- Protein Information Resource
PIR- Protein Information ResourcePIR- Protein Information Resource
PIR- Protein Information Resource
 

Semelhante a Bioinformatica 20-10-2011-t3-scoring matrices

20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
Computer Science Club
 
Practical 7 dna, rna and the flow of genetic information5
Practical 7 dna, rna and the flow of genetic information5Practical 7 dna, rna and the flow of genetic information5
Practical 7 dna, rna and the flow of genetic information5
Osama Barayan
 
Hw1 Gen320fall07revised
Hw1 Gen320fall07revisedHw1 Gen320fall07revised
Hw1 Gen320fall07revised
ariddlegirl
 

Semelhante a Bioinformatica 20-10-2011-t3-scoring matrices (20)

Computation and System Biology Assignment Help
Computation and System Biology Assignment HelpComputation and System Biology Assignment Help
Computation and System Biology Assignment Help
 
Bioinformatica 27-10-2011-t4-alignments
Bioinformatica 27-10-2011-t4-alignmentsBioinformatica 27-10-2011-t4-alignments
Bioinformatica 27-10-2011-t4-alignments
 
Bioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmmBioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmm
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
 
2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
 
Transcriptomics and lexico-syntactic analysis
Transcriptomics and lexico-syntactic analysisTranscriptomics and lexico-syntactic analysis
Transcriptomics and lexico-syntactic analysis
 
2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
 
Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)
 
Bioinformatica t3-scoring matrices
Bioinformatica t3-scoring matricesBioinformatica t3-scoring matrices
Bioinformatica t3-scoring matrices
 
Practical 7 dna, rna and the flow of genetic information5
Practical 7 dna, rna and the flow of genetic information5Practical 7 dna, rna and the flow of genetic information5
Practical 7 dna, rna and the flow of genetic information5
 
Bioinformatics2015.pdf
Bioinformatics2015.pdfBioinformatics2015.pdf
Bioinformatics2015.pdf
 
Bioinformatics2015.pdf
Bioinformatics2015.pdfBioinformatics2015.pdf
Bioinformatics2015.pdf
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
 
Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014
 
Hw1 Gen320fall07revised
Hw1 Gen320fall07revisedHw1 Gen320fall07revised
Hw1 Gen320fall07revised
 
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
 
Prediction of transcription factor binding to DNA using rule induction methods
Prediction of transcription factor binding to DNA using rule induction methodsPrediction of transcription factor binding to DNA using rule induction methods
Prediction of transcription factor binding to DNA using rule induction methods
 
ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdf
 

Mais de Prof. Wim Van Criekinge

Mais de Prof. Wim Van Criekinge (20)

2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload
 
P7 2018 biopython3
P7 2018 biopython3P7 2018 biopython3
P7 2018 biopython3
 
P6 2018 biopython2b
P6 2018 biopython2bP6 2018 biopython2b
P6 2018 biopython2b
 
P4 2018 io_functions
P4 2018 io_functionsP4 2018 io_functions
P4 2018 io_functions
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
T1 2018 bioinformatics
T1 2018 bioinformaticsT1 2018 bioinformatics
T1 2018 bioinformatics
 
P1 2018 python
P1 2018 pythonP1 2018 python
P1 2018 python
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]
 
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload
 
2018 03 20_biological_databases_part3
2018 03 20_biological_databases_part32018 03 20_biological_databases_part3
2018 03 20_biological_databases_part3
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
P7 2017 biopython3
P7 2017 biopython3P7 2017 biopython3
P7 2017 biopython3
 
P6 2017 biopython2
P6 2017 biopython2P6 2017 biopython2
P6 2017 biopython2
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

Bioinformatica 20-10-2011-t3-scoring matrices

  • 1.  
  • 2. FBW 20-10-2011 Wim Van Criekinge
  • 3.
  • 4.
  • 5.
  • 14.
  • 15. Identity The extent to which two (nucleotide or amino acid) sequences are invariant. Homology Similarity attributed to descent from a common ancestor. Definitions RBP: 26 RV K ENFDKARFS GTW YA MA KKDPEGLFLQDNIV A EFS V DE T GQMSATAKGRVRL L NN W D- 84 + K ++ + + GTW ++ MA + L + A V T + + L + W + glycodelin: 23 QT K QDLELPKLA GTW HS MA MA-TNNISLMATLK A PLR V HI T SLLPTPEDNLEIV L HR W EN 81
  • 16. Orthologous Homologous sequences in different species that arose from a common ancestral gene during speciation; may or may not be responsible for a similar function. Paralogous Homologous sequences within a single species that arose by gene duplication. Definitions
  • 18. fly GAKKVIISAP SAD.APM..F VCGVNLDAYK PDMKVVSNAS CTTNCLAPLA human GAKRVIISAP SAD.APM..F VMGVNHEKYD NSLKIISNAS CTTNCLAPLA plant GAKKVIISAP SAD.APM..F VVGVNEHTYQ PNMDIVSNAS CTTNCLAPLA bacterium GAKKVVMTGP SKDNTPM..F VKGANFDKY. AGQDIVSNAS CTTNCLAPLA yeast GAKKVVITAP SS.TAPM..F VMGVNEEKYT SDLKIVSNAS CTTNCLAPLA archaeon GADKVLISAP PKGDEPVKQL VYGVNHDEYD GE.DVVSNAS CTTNSITPVA fly KVINDNFEIV EGLMTTVHAT TATQKTVDGP SGKLWRDGRG AAQNIIPAST human KVIHDNFGIV EGLMTTVHAI TATQKTVDGP SGKLWRDGRG ALQNIIPAST plant KVVHEEFGIL EGLMTTVHAT TATQKTVDGP SMKDWRGGRG ASQNIIPSST bacterium KVINDNFGII EGLMTTVHAT TATQKTVDGP SHKDWRGGRG ASQNIIPSST yeast KVINDAFGIE EGLMTTVHSL TATQKTVDGP SHKDWRGGRT ASGNIIPSST archaeon KVLDEEFGIN AGQLTTVHAY TGSQNLMDGP NGKP.RRRRA AAENIIPTST fly GAAKAVGKVI PALNGKLTGM AFRVPTPNVS VVDLTVRLGK GASYDEIKAK human GAAKAVGKVI PELNGKLTGM AFRVPTANVS VVDLTCRLEK PAKYDDIKKV plant GAAKAVGKVL PELNGKLTGM AFRVPTSNVS VVDLTCRLEK GASYEDVKAA bacterium GAAKAVGKVL PELNGKLTGM AFRVPTPNVS VVDLTVRLEK AATYEQIKAA yeast GAAKAVGKVL PELQGKLTGM AFRVPTVDVS VVDLTVKLNK ETTYDEIKKV archaeon GAAQAATEVL PELEGKLDGM AIRVPVPNGS ITEFVVDLDD DVTESDVNAA Multiple sequence alignment of glyceraldehyde- 3-phsophate dehydrogenases
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54. Other similarity scoring matrices might be constructed from any property of amino acids that can be quantified - partition coefficients between hydrophobic and hydrophilic phases - charge - molecular volume Unfortunately, …
  • 55.
  • 56. Protein Eng. 1996 Jan;9(1):27-36.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65. Dayhoff’s PAM1 mutation probability matrix (Transition Matrix)
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83. Dayhoff’s PAM1 mutation probability matrix (Transition Matrix)
  • 84.
  • 87.
  • 88.
  • 89.  
  • 90.
  • 91. 4 3 2 1 0 A brief history of time (BYA) Origin of life Origin of eukaryotes insects Fungi/animal Plant/animal Earliest fossils BYA
  • 92. Margaret Dayhoff’s 34 protein superfamilies Protein PAMs per 100 million years Ig kappa chain 37 Kappa casein 33 Lactalbumin 27 Hemoglobin  12 Myoglobin 8.9 Insulin 4.4 Histone H4 0.10 Ubiquitin 0.00
  • 93.
  • 94.
  • 95.
  • 96. BLOSUM ( BLO ck – SUM ) scoring DDNAAV DNAVDD NNVAVV Block = ungapped alignent Eg. Amino Acids D N V A a b c d e f 1 2 3 S = 3 sequences W = 6 aa N= (W*S*(S-1))/2 = 18 pairs
  • 97. A. Observed pairs DDNAAV DNAVDD NNVAVV a b c d e f 1 2 3 D N A V D N A V 1 4 1 3 1 1 1 1 4 1 f f ij D N A V D N A V .056 .222 .056 .167 .056 .056 .056 .056 .222 .056 g ij /18 Relative frequency table Probability of obtaining a pair if randomly choosing pairs from block
  • 98. B. Expected pairs A DDDDD NNNN AAAA VVVVV DDNAAV DNAVDD NNVAVV P i 5/18 4/18 4/18 5/18 P{Draw DN pair}= P{Draw D, then N or Draw M, then D} P{Draw DN pair}= P D P N + P N P D = 2 * (5/18)*(4/18) = .123 D N A V D N A V .077 .123 .154 .123 .049 .123 .099 .049 .123 .049 e ij Random rel. frequency table Probability of obtaining a pair of each amino acid drawn independently from block
  • 99.
  • 100.
  • 101.
  • 102.
  • 103.
  • 104.
  • 105. Rat versus mouse RBP Rat versus bacterial lipocalin
  • 106.
  • 107.
  • 108.
  • 109.
  • 110.
  • 111.
  • 112.
  • 113.
  • 114. Reduction of Dot Plot Noise Self alignment of ACCTGAGCTCACCTGAGTTA
  • 115.
  • 116. Chromosome Y self comparison
  • 117.
  • 118.
  • 119.
  • 120.
  • 121.
  • 122.
  • 123.

Notas do Editor

  1. Mutation probability matrix for the evolutionary distance of 1 PAM (i.e., one Accepted Point Mutation per 100 amino acids). An element of this matrix, [Mij], gives the probability that the amino acid in column j will be replaced by the amino acid in row i after a given evolutionary interval, in this case 1 PAM. Thus, there is a 0.56% probability that Asp will be replaced by Glu. To simplify the appearance, the elements are shown multiplied by 10,000. (Adapted from Figure 82. Atlas of Protein Sequence and Structure, Suppl 3, 1978, M.O. Dayhoff, ed. National Biomedical Research Foundation, 1979.)