SlideShare uma empresa Scribd logo
1 de 23
Protein Structural Bioinformatics
Definition
The subdiscipline of bioinformatics that focuses on the
representation, storage, retrieval, analysis, and display of
structural information at the atomic and subcellular spatial
scales.
(From Structural Bioinformatics, by P.E. Bourne & H. Weissig (eds.), John Wiley &
Sons, Inc., 2003, pp.4.)
Why is STRUCTURAL bioinformatics important?
Because a protein’s function is determined by its structure.
Knowledge of a protein’s structure is necessary in order to gain
a full understanding of the biological role of a protein.
Bioinformatics methods can be used to analyze
protein structural data in the following ways:
• Visualization of protein structures
• Alignment of protein structures
• Classification of proteins into families, based on similarity
of their structures
• Prediction of protein structures
• Simulation of protein folding and dynamic motions
Protein structure determination by x-ray crystallography or
NMR is difficult (see Powerpoint slides from last module).
It takes 1-3 years to solve a protein structure by these methods. Certain
proteins, such as membrane proteins, are extremely difficult or impossible to
solve by these methods. Due to genomic sequencing efforts, the gap
between known protein sequences and known protein structures is
increasing– only about 3,000 unique protein structures have been
determined, but over 1 million unique sequences have been determined.
Therefore, it is necessary to use bioinformatics methods to predict the
structures of proteins for which a crystal structure or NMR structure has not
been determined.
Bioinformatics methods can predict:
(1) secondary structural elements in a protein sequence
(2) the tertiary structure of the entire sequence
(3) “special” structures, such as transmembrane a-helices,
transmembrane b-barrels, coiled coils, and leucine zippers
Protein Secondary Structure Prediction
All secondary structure prediction is based on the assumption that there
should be a correlation between amino acid sequence and secondary
structure– in other words, it is assumed that certain stretches of amino acids
are more likely to form one type of secondary structure than another.
During secondary structure prediction, the conformational state of each
residue in a protein sequence is predicted; generally each residue is
predicted as having one of three possible states:
(1) a-helical structure
(2) b-strand
(3) “other” (b-turn, loop, or random coil)
Sometimes b-turn is separated as a 4th state.
Why is prediction of secondary structure useful?
It can help guide sequence alignment or improve existing sequence
alignment of distantly related sequences. It is also an intermediate step in
some methods for tertiary structure prediction.
Methods of secondary structure prediction fall into
two broad classes:
Ab initio methods– predict secondary structure based solely
on protein sequence; these methods compute statistics for the
residues that occur in different secondary structural elements in
proteins with known structures, in order to identify “patterns” in
the types of residues that occur in a given type of secondary
structure.
Homology-based methods– make use of multiple sequence
alignments of homologous proteins to predict secondary
structure; these methods are able to locate conserved patterns
that are characteristic of particular secondary structural
elements across the aligned family members.
Certain amino acids are observed more frequently than others in a-
helices, b-strands, and b-turns in crystal structures (see Figure). This
leads to the idea that each amino acid tends to “prefer” being
constrained in a certain type of secondary structure, or has an
“intrinsic propensity” to adopt that secondary structure.
Fig. 4-10 from Lehninger Principles of Biochemistry, 4th ed.
The figure shows that:
Glu, Met, Ala are most
frequent in a-helices
Val, Tyr, Ile are most
frequent in b-strands
Pro, Gly, Asn are most
frequent in b-turns
Based on this data, it
is believed that Glu
has a high a-helical
propensity, but a low
b-strand propensity.
Ab initio methods of secondary structure prediction:
• These methods calculate the relative propensity (intrinsic tendency) of each
amino acid in a protein sequence to belong to a certain secondary structural
element.
• Propensity scores for the 20 amino acids are derived from known protein
structures: these propensities are calculated from the relative frequency of a
given amino acid within the proteins, its frequency in a given type of
secondary structure, and the fraction of all amino acids occurring in that type
of secondary structure.
• Stretches of a protein’s sequence that contain many residues with a high a-
helical propensity are predicted to fold into a-helices. Stretches of sequence
that contain many residues with a high b-strand propensity are predicted to
fold into b-strands.
• Two examples: Chou-Fasman method, GOR method
Accuracy of ab initio methods:
• These methods are not very accurate:
• Chou-Fasman method, 50%-60% accuracy
• GOR method, 64% accuracy, drastically underpredicts b-strands
• These methods are only a little better than randomly assigning secondary
structure! Known proteins consist of ~31% a-helix and ~28% b-sheet, so
randomly assigning secondary structural elements to residues would result in
~30% accuracy.
• Specific problems with these methods:
• Tend to underpredict the lengths of a-helices and b-strands– can’t
identify the first and last residues of helices and strands very well
• Tend to miss b-strands completely
A few homology-based 2o structure prediction methods:
Neural network methods:
PROFsec (an improved version of PHDsec)
http://www.predictprotein.org/
PSIPRED
http://bioinf.cs.ucl.ac.uk/psipred/
SSpro (newest version is 4.0)
http://scratch.proteomics.ics.uci.edu/
SAM-T (SAM-T08 is newest version; SAM-T06, SAM-T02, SAM-T99-- old versions)
http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html
Nearest-neighbor methods:
NNSSP
no longer available online
PREDATOR
http://mobyle.pasteur.fr/cgi-bin/portal.py?#forms::predator
HMM methods:
HMMSTER
http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php
A few methods for predicting transmembrane a-helices:
TMHMM
http://www.cbs.dtu.dk/services/TMHMM/
HMMTOP
http://www.enzim.hu/hmmtop/index.html
Phobius (also predicts presence of signal peptides)
http://phobius.sbc.su.se/
TopPred
http://mobyle.pasteur.fr/cgi-bin/portal.py?#forms::toppred
PRED-TMR
http://athina.biol.uoa.gr/PRED-TMR/
DAS
http://mendel.imp.ac.at/sat/DAS/DAS.html
TMpred
http://www.ch.embnet.org/software/TMPRED_form.html
MEMSAT
http://bioinf.cs.ucl.ac.uk/psipred/
Accuracies of the methods:
Levels of accuracy are reported by the developers to be in the range of 75-95%.
At least one study (2001) found TMHMM to be the best performing program.
It is best to use several methods and compare the results to arrive at a consensus
prediction. When different methods, specifically methods that are based on different
algorithms, give similar results, the reliability of the results is higher.
Tertiary structure prediction methods fall into three
classes:
(1) Homology modeling (also called comparative modeling)
A structure is built based on the known structure of another protein that is
similar in sequence (a homolog).
(2) Threading (also called structural fold recognition)
A structure is predicted for a protein by “threading” its sequence through a
variety of known structures to determine which structure the sequence best
fits.
(3) Ab initio prediction (also called de novo prediction)
A structure is predicted based only on the amino acid sequence of the
protein, using the physicochemical properties of its residues and the
principles governing protein folding.
Homology modeling for tertiary structure prediction:
Homology modeling is based on the idea that if two proteins share a high
degree of sequence similarity (i.e., they are close homologs), they are likely
to have very similar 3D structures. In general, proteins that share >30%
sequence identity are likely to be quite similar in structure.
Therefore, if a protein of unknown structure is similar in sequence to a
protein of known structure, the known structure can be used as a template to
which the unknown sequence is fit. The structure that is built for the
unknown sequence is then called a homology model for the structure of that
sequence.
The “safe homology
modeling zone,” above the
gray curve, is the region
where two proteins are likely
to have the same structure.
Fig. 5 from R. Nair & B. Rost,
Protein Science (2002) 11: 2836-47.
Steps in homology modeling for tertiary structure
prediction:
The protein of unknown structure for which a structural model is to be built
will be called the “target sequence.”
1. Template selection– Identify protein(s) in the PDB that are
homologous to the target sequence using BLAST or PSI-BLAST. If a close
homolog with known structure is found, its structure will serve as a template
to which the target sequence will be matched. The template should have
at least 30% sequence identity with the target. (Proteins that share less
than 30% sequence identity may not be similar enough in structure to carry
out homology modeling.) If PSI-BLAST does not identify a suitable template,
it will probably be necessary to construct a structural model by threading.
It is possible to use multiple templates if more than one good template is
identified. When multiple templates are available, it is best to use more than
one template to avoid biasing the model toward a single protein. The
template used in the next step of homology modeling will then be an
averaged structure based on all of the chosen templates.
Steps in homology modeling for tertiary structure
prediction:
2. Sequence alignment– Construct a multiple sequence alignment of
the target, the template, and other homologous sequences. It is actually the
alignment of the target and template that is of interest, but the inclusion of
other homologs provides more information, helping to ensure that the best
alignment of homologous residues is achieved. The quality of the target-
template alignment is critical for constructing an accurate structural
model for the target. If a given residue in the target is not aligned with the
proper residue in the template, the error cannot be corrected in later steps of
model building. A robust multiple sequence alignment program should be
used for this step, and the resulting alignment should be very carefully
examined and manually refined if necessary.
Steps in homology modeling for tertiary structure
prediction:
3. Backbone model building– Residues in the aligned regions of the
target and template are assumed to adopt the same structure. Therefore,
the backbone atoms of these residues in the target can be placed in the
same 3D location as the backbone atoms of these residues in the template.
See the alignment below as an example.
Target: ...FKSQAAIHEAYCNFHYKVTAAASRTPEIDFDVHFSSIF...
Template: ...FKQQANIHCAYCNGAYKIG-------GKELQVHFSWLF...
For these residues, backbone atoms of the target are assumed
to occupy the same 3D location as those of the template.
F aligned with F. They are identical,
so all atoms of target F will overlap
the 3D positions of all atoms of
template F.
E aligned with D. They are not identical, but
their backbone atoms can be assumed to
occupy the same 3D position. So backbone
atoms of target D will overlap the 3D
positions of backbone atoms of template E.
Steps in homology modeling for tertiary structure prediction:
4. Loop building– There are likely to be regions in the alignment where
gaps appear because the target sequence does not match the template. The
target sequence residues in these gap regions are assumed to form a loop that
is not present in the template structure. The structure of this loop can be built
using several different methods. In any case, it is a difficult problem since the
template provides no information to guide the building of the loop structure.
Target: ...FKSQAAIHEAYCNFHYKVTAAASRTPEIDFDVHFSSIF...
Template: ...FKQQANIHCAYCNGAYKIG-------GKELQVHFSWLF...
“Extra” residues in the target sequence do not
match the template and are assumed to form a loop.
target loop
Steps in homology modeling for tertiary structure
prediction:
5. Side chain addition– The side chains are added to the backbone
structure. Each side chain could potentially have many possible
conformations due to bond rotation, but steric clashes with neighboring
atoms are not allowed. Therefore, side chain that have the lowest interaction
energy with nearby atoms are chosen.
Target: ...FKSQAAIHEAYCNFHYKVTAAASRTPEIDFDVHFSSIF...
Template: ...FKQQANIHCAYCNGAYKIG-------GKELQVHFSWLF...
Target and template are both F, so
all atoms of the target side chain
can be modeled as having the same
3D positions as the template side
chain, at least initially. (Small
changes in position may be
necessary in later refinement steps.)
Target and template have different
side chains (D vs. E), so the side
chain rotamer that is chosen for the
target D must not overlap/clash with
any neighboring atoms.
Steps in homology modeling for tertiary structure
prediction:
6. Model refinement– Unfavorable bond angles, bond lengths, and
atom contacts are likely to exist in the preliminary model, so an energy
minimization procedure is applied to refine the model. In this procedure,
atom positions are shifted so that the overall conformation of the entire
structure has the lowest energy potential. Only limited energy minimization
should be applied (a few hundred iterations) so that major errors are
removed but residues are not moved from their correct positions.
7. Model evaluation– The model is checked for anomalies in dihedral
angles, bond lengths, and atom contacts.
Programs for homology modeling:
Many programs for automated homology modeling are now available, so
anyone can construct a homology model on a regular PC. However,
construction of a “good” homology model (at least for sequences that are not
highly similar) usually requires some expertise and usually should be done
with human intervention, rather than in a fully automated fashion.
A few of the freely available programs for homology
modeling:
SWISS-MODEL– Produces accurate models; fast; good tutorials available.
http://swissmodel.expasy.org/
I-TASSER– Produces accurate models; easy to use, but slow
http://zhanglab.ccmb.med.umich.edu/I-TASSER/
Modeller– must be downloaded and installed locally
http://salilab.org/modeller/modeller.html
WHAT IF
http://swift.cmbi.ru.nl/servers/html/index.html
http://swift.cmbi.ru.nl/whatif/
Is a homology model CORRECT?
Since the actual (experimentally determined) structure of the target is not
known, there is no way to say whether or not the homology model is
“correct.” Instead, the best a researcher can do is compare the homology
model to the structure of the template from which it was derived. If the atom
positions in the model do not deviate very much from those of the template,
the homology model is said to be “accurate.” The greater the deviation
between model and template, the lower the accuracy of the model.
When is a homology model definitely INCORRECT?
A homology model has regions that are incorrect if it contains structural
features that do not occur in native proteins, such as:
• Hydrophobic side chains on the surface of the model (these side
chains should be buried)
• Unreasonable bond lengths or angles
• Unfavorable noncovalent contacts between atoms (clashes)
• Unreasonable dihedral angles
Accuracy of homology modeling:
The template selection and alignment accuracy are crucial to the accuracy of a homology
model. The accuracy of the model depends on the percentage of sequence identity
between the target and template. The average coordinate agreement between the
modeled structure and the actual structure drops ~0.3 Å for each 10% reduction in
sequence identity.
The largest structural differences between homologous proteins are in surface loops. In
other words, the structure of the protein core is more highly conserved. Therefore, the
regions that are most likely to be in error in a homology model are the surface loops.
High-accuracy homology models can be built when the target and template have 50%
or greater sequence identity. Errors are mostly mistakes in side-chain packing, small
shifts of the core backbone regions, and occasionally larger errors in loops.
Medium-accuracy homology models can be built when the proteins share 30-50%
sequence identity. There can be alignment mistakes, and there are more frequent side-
chain packing, core distortion, and loop modeling errors.
Low-accuracy homology models are based on proteins that share <30% sequence
identity. If a model is based on an almost insignificant alignment to a known structure, the
model may have an entirely incorrect fold.
The best model-building programs will produce models of similar accuracy, provided that
the methods are used optimally.
Stephen James
stephen@macfast.org
9746935363

Mais conteúdo relacionado

Mais procurados

Protein Predictinon
Protein PredictinonProtein Predictinon
Protein Predictinon
SHRADHEYA GUPTA
 

Mais procurados (20)

Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Homology modeling
Homology modelingHomology modeling
Homology modeling
 
Protein folding
Protein foldingProtein folding
Protein folding
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
Protein Predictinon
Protein PredictinonProtein Predictinon
Protein Predictinon
 
Protein folding slids
Protein folding slidsProtein folding slids
Protein folding slids
 
Cath
CathCath
Cath
 
Dynamic programming
Dynamic programming Dynamic programming
Dynamic programming
 
artificial neural network-gene prediction
artificial neural network-gene predictionartificial neural network-gene prediction
artificial neural network-gene prediction
 
Protein database
Protein databaseProtein database
Protein database
 
Protein Threading
Protein ThreadingProtein Threading
Protein Threading
 
RNA structure analysis
RNA structure analysis RNA structure analysis
RNA structure analysis
 
Protein purification
Protein purificationProtein purification
Protein purification
 
System biology and its tools
System biology and its toolsSystem biology and its tools
System biology and its tools
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Gene prediction and expression
Gene prediction and expressionGene prediction and expression
Gene prediction and expression
 
Blast Algorithm
Blast AlgorithmBlast Algorithm
Blast Algorithm
 
Sequence comparison techniques
Sequence comparison techniquesSequence comparison techniques
Sequence comparison techniques
 

Destaque

protein sturcture prediction and molecular modelling
protein sturcture prediction and molecular modellingprotein sturcture prediction and molecular modelling
protein sturcture prediction and molecular modelling
Dileep Paruchuru
 
MEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational ExperimentsMEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational Experiments
GIScRG
 
A search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-BacaA search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-Baca
Roderic Page
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informatics
Daniela Rotariu
 

Destaque (20)

Chou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure predictionChou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure prediction
 
Protein Structure Prediction
Protein Structure PredictionProtein Structure Prediction
Protein Structure Prediction
 
protein sturcture prediction and molecular modelling
protein sturcture prediction and molecular modellingprotein sturcture prediction and molecular modelling
protein sturcture prediction and molecular modelling
 
RNA secondary structure prediction
RNA secondary structure predictionRNA secondary structure prediction
RNA secondary structure prediction
 
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic AlgorithmsLit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure Prediction
 
Protein structure classification
Protein structure classificationProtein structure classification
Protein structure classification
 
Homology modelling
Homology modellingHomology modelling
Homology modelling
 
Sk rndm grmmrs
Sk rndm grmmrsSk rndm grmmrs
Sk rndm grmmrs
 
Presentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali ShahPresentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali Shah
 
MEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational ExperimentsMEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational Experiments
 
Ph.D. work
Ph.D. workPh.D. work
Ph.D. work
 
Structure prediction of Proteins
Structure prediction of ProteinsStructure prediction of Proteins
Structure prediction of Proteins
 
A search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-BacaA search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-Baca
 
Presentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informaticePresentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informatice
 
Abinitio.ppt
Abinitio.pptAbinitio.ppt
Abinitio.ppt
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informatics
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
 
BLAST
BLASTBLAST
BLAST
 
Ketone bodies, ketosis & it’s pathogenesis
Ketone bodies, ketosis & it’s pathogenesisKetone bodies, ketosis & it’s pathogenesis
Ketone bodies, ketosis & it’s pathogenesis
 

Semelhante a Protein structure 2

58.Comparative modelling of cellulase from Aspergillus terreus
58.Comparative modelling of cellulase from Aspergillus terreus58.Comparative modelling of cellulase from Aspergillus terreus
58.Comparative modelling of cellulase from Aspergillus terreus
Annadurai B
 
Computational Prediction Of Protein-1.pptx
Computational Prediction Of Protein-1.pptxComputational Prediction Of Protein-1.pptx
Computational Prediction Of Protein-1.pptx
ashharnomani
 
Comparative Protein Structure Modeling and itsApplications
Comparative Protein Structure Modeling and itsApplicationsComparative Protein Structure Modeling and itsApplications
Comparative Protein Structure Modeling and itsApplications
LynellBull52
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
Abhik Seal
 

Semelhante a Protein structure 2 (20)

Threading modeling methods
Threading modeling methodsThreading modeling methods
Threading modeling methods
 
58.Comparative modelling of cellulase from Aspergillus terreus
58.Comparative modelling of cellulase from Aspergillus terreus58.Comparative modelling of cellulase from Aspergillus terreus
58.Comparative modelling of cellulase from Aspergillus terreus
 
Computational predictiction of prrotein structure
Computational predictiction of prrotein structureComputational predictiction of prrotein structure
Computational predictiction of prrotein structure
 
Modelling Proteins By Computational Structural Biology
Modelling Proteins By Computational Structural BiologyModelling Proteins By Computational Structural Biology
Modelling Proteins By Computational Structural Biology
 
Drug discovery presentation
Drug discovery presentationDrug discovery presentation
Drug discovery presentation
 
HOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAYHOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAY
 
L1Protein_Structure_Analysis.pptx
L1Protein_Structure_Analysis.pptxL1Protein_Structure_Analysis.pptx
L1Protein_Structure_Analysis.pptx
 
Computational Prediction Of Protein-1.pptx
Computational Prediction Of Protein-1.pptxComputational Prediction Of Protein-1.pptx
Computational Prediction Of Protein-1.pptx
 
protein Modeling Abi.pptx
protein Modeling Abi.pptxprotein Modeling Abi.pptx
protein Modeling Abi.pptx
 
homology modellign lecture .pdf
homology modellign lecture .pdfhomology modellign lecture .pdf
homology modellign lecture .pdf
 
homology modellign lecture .pdf
homology modellign lecture .pdfhomology modellign lecture .pdf
homology modellign lecture .pdf
 
Comparative Protein Structure Modeling and itsApplications
Comparative Protein Structure Modeling and itsApplicationsComparative Protein Structure Modeling and itsApplications
Comparative Protein Structure Modeling and itsApplications
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
 
Presentation1
Presentation1Presentation1
Presentation1
 
Protein struc pred-Ab initio and other methods as a short introduction.ppt
Protein struc pred-Ab initio and other methods as a short introduction.pptProtein struc pred-Ab initio and other methods as a short introduction.ppt
Protein struc pred-Ab initio and other methods as a short introduction.ppt
 
Homology modelling
Homology modellingHomology modelling
Homology modelling
 
6. protein secondry structure ppt
6. protein secondry structure ppt6. protein secondry structure ppt
6. protein secondry structure ppt
 
demonstration lecture on Homology modeling
demonstration lecture on Homology modelingdemonstration lecture on Homology modeling
demonstration lecture on Homology modeling
 
HOMOLOGY MODELLING.pptx
HOMOLOGY MODELLING.pptxHOMOLOGY MODELLING.pptx
HOMOLOGY MODELLING.pptx
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
 

Mais de Rainu Rajeev

Mais de Rainu Rajeev (6)

Jsir 59(2) 87 101
Jsir 59(2) 87 101Jsir 59(2) 87 101
Jsir 59(2) 87 101
 
Areps siddiqui etal 2013
Areps siddiqui etal 2013Areps siddiqui etal 2013
Areps siddiqui etal 2013
 
Marine microbiology ecology &amp; applications colin munn
Marine microbiology  ecology &amp; applications colin munnMarine microbiology  ecology &amp; applications colin munn
Marine microbiology ecology &amp; applications colin munn
 
Agricultural biotechnology
Agricultural biotechnologyAgricultural biotechnology
Agricultural biotechnology
 
Marine board pp17_microcean
Marine board pp17_microceanMarine board pp17_microcean
Marine board pp17_microcean
 
Bioinformatics final
Bioinformatics finalBioinformatics final
Bioinformatics final
 

Último

Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
LeenakshiTyagi
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 

Último (20)

Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 

Protein structure 2

  • 1.
  • 2. Protein Structural Bioinformatics Definition The subdiscipline of bioinformatics that focuses on the representation, storage, retrieval, analysis, and display of structural information at the atomic and subcellular spatial scales. (From Structural Bioinformatics, by P.E. Bourne & H. Weissig (eds.), John Wiley & Sons, Inc., 2003, pp.4.) Why is STRUCTURAL bioinformatics important? Because a protein’s function is determined by its structure. Knowledge of a protein’s structure is necessary in order to gain a full understanding of the biological role of a protein.
  • 3. Bioinformatics methods can be used to analyze protein structural data in the following ways: • Visualization of protein structures • Alignment of protein structures • Classification of proteins into families, based on similarity of their structures • Prediction of protein structures • Simulation of protein folding and dynamic motions
  • 4. Protein structure determination by x-ray crystallography or NMR is difficult (see Powerpoint slides from last module). It takes 1-3 years to solve a protein structure by these methods. Certain proteins, such as membrane proteins, are extremely difficult or impossible to solve by these methods. Due to genomic sequencing efforts, the gap between known protein sequences and known protein structures is increasing– only about 3,000 unique protein structures have been determined, but over 1 million unique sequences have been determined. Therefore, it is necessary to use bioinformatics methods to predict the structures of proteins for which a crystal structure or NMR structure has not been determined. Bioinformatics methods can predict: (1) secondary structural elements in a protein sequence (2) the tertiary structure of the entire sequence (3) “special” structures, such as transmembrane a-helices, transmembrane b-barrels, coiled coils, and leucine zippers
  • 5. Protein Secondary Structure Prediction All secondary structure prediction is based on the assumption that there should be a correlation between amino acid sequence and secondary structure– in other words, it is assumed that certain stretches of amino acids are more likely to form one type of secondary structure than another. During secondary structure prediction, the conformational state of each residue in a protein sequence is predicted; generally each residue is predicted as having one of three possible states: (1) a-helical structure (2) b-strand (3) “other” (b-turn, loop, or random coil) Sometimes b-turn is separated as a 4th state. Why is prediction of secondary structure useful? It can help guide sequence alignment or improve existing sequence alignment of distantly related sequences. It is also an intermediate step in some methods for tertiary structure prediction.
  • 6. Methods of secondary structure prediction fall into two broad classes: Ab initio methods– predict secondary structure based solely on protein sequence; these methods compute statistics for the residues that occur in different secondary structural elements in proteins with known structures, in order to identify “patterns” in the types of residues that occur in a given type of secondary structure. Homology-based methods– make use of multiple sequence alignments of homologous proteins to predict secondary structure; these methods are able to locate conserved patterns that are characteristic of particular secondary structural elements across the aligned family members.
  • 7. Certain amino acids are observed more frequently than others in a- helices, b-strands, and b-turns in crystal structures (see Figure). This leads to the idea that each amino acid tends to “prefer” being constrained in a certain type of secondary structure, or has an “intrinsic propensity” to adopt that secondary structure. Fig. 4-10 from Lehninger Principles of Biochemistry, 4th ed. The figure shows that: Glu, Met, Ala are most frequent in a-helices Val, Tyr, Ile are most frequent in b-strands Pro, Gly, Asn are most frequent in b-turns Based on this data, it is believed that Glu has a high a-helical propensity, but a low b-strand propensity.
  • 8. Ab initio methods of secondary structure prediction: • These methods calculate the relative propensity (intrinsic tendency) of each amino acid in a protein sequence to belong to a certain secondary structural element. • Propensity scores for the 20 amino acids are derived from known protein structures: these propensities are calculated from the relative frequency of a given amino acid within the proteins, its frequency in a given type of secondary structure, and the fraction of all amino acids occurring in that type of secondary structure. • Stretches of a protein’s sequence that contain many residues with a high a- helical propensity are predicted to fold into a-helices. Stretches of sequence that contain many residues with a high b-strand propensity are predicted to fold into b-strands. • Two examples: Chou-Fasman method, GOR method
  • 9. Accuracy of ab initio methods: • These methods are not very accurate: • Chou-Fasman method, 50%-60% accuracy • GOR method, 64% accuracy, drastically underpredicts b-strands • These methods are only a little better than randomly assigning secondary structure! Known proteins consist of ~31% a-helix and ~28% b-sheet, so randomly assigning secondary structural elements to residues would result in ~30% accuracy. • Specific problems with these methods: • Tend to underpredict the lengths of a-helices and b-strands– can’t identify the first and last residues of helices and strands very well • Tend to miss b-strands completely
  • 10. A few homology-based 2o structure prediction methods: Neural network methods: PROFsec (an improved version of PHDsec) http://www.predictprotein.org/ PSIPRED http://bioinf.cs.ucl.ac.uk/psipred/ SSpro (newest version is 4.0) http://scratch.proteomics.ics.uci.edu/ SAM-T (SAM-T08 is newest version; SAM-T06, SAM-T02, SAM-T99-- old versions) http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html Nearest-neighbor methods: NNSSP no longer available online PREDATOR http://mobyle.pasteur.fr/cgi-bin/portal.py?#forms::predator HMM methods: HMMSTER http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php
  • 11. A few methods for predicting transmembrane a-helices: TMHMM http://www.cbs.dtu.dk/services/TMHMM/ HMMTOP http://www.enzim.hu/hmmtop/index.html Phobius (also predicts presence of signal peptides) http://phobius.sbc.su.se/ TopPred http://mobyle.pasteur.fr/cgi-bin/portal.py?#forms::toppred PRED-TMR http://athina.biol.uoa.gr/PRED-TMR/ DAS http://mendel.imp.ac.at/sat/DAS/DAS.html TMpred http://www.ch.embnet.org/software/TMPRED_form.html MEMSAT http://bioinf.cs.ucl.ac.uk/psipred/ Accuracies of the methods: Levels of accuracy are reported by the developers to be in the range of 75-95%. At least one study (2001) found TMHMM to be the best performing program. It is best to use several methods and compare the results to arrive at a consensus prediction. When different methods, specifically methods that are based on different algorithms, give similar results, the reliability of the results is higher.
  • 12. Tertiary structure prediction methods fall into three classes: (1) Homology modeling (also called comparative modeling) A structure is built based on the known structure of another protein that is similar in sequence (a homolog). (2) Threading (also called structural fold recognition) A structure is predicted for a protein by “threading” its sequence through a variety of known structures to determine which structure the sequence best fits. (3) Ab initio prediction (also called de novo prediction) A structure is predicted based only on the amino acid sequence of the protein, using the physicochemical properties of its residues and the principles governing protein folding.
  • 13. Homology modeling for tertiary structure prediction: Homology modeling is based on the idea that if two proteins share a high degree of sequence similarity (i.e., they are close homologs), they are likely to have very similar 3D structures. In general, proteins that share >30% sequence identity are likely to be quite similar in structure. Therefore, if a protein of unknown structure is similar in sequence to a protein of known structure, the known structure can be used as a template to which the unknown sequence is fit. The structure that is built for the unknown sequence is then called a homology model for the structure of that sequence. The “safe homology modeling zone,” above the gray curve, is the region where two proteins are likely to have the same structure. Fig. 5 from R. Nair & B. Rost, Protein Science (2002) 11: 2836-47.
  • 14. Steps in homology modeling for tertiary structure prediction: The protein of unknown structure for which a structural model is to be built will be called the “target sequence.” 1. Template selection– Identify protein(s) in the PDB that are homologous to the target sequence using BLAST or PSI-BLAST. If a close homolog with known structure is found, its structure will serve as a template to which the target sequence will be matched. The template should have at least 30% sequence identity with the target. (Proteins that share less than 30% sequence identity may not be similar enough in structure to carry out homology modeling.) If PSI-BLAST does not identify a suitable template, it will probably be necessary to construct a structural model by threading. It is possible to use multiple templates if more than one good template is identified. When multiple templates are available, it is best to use more than one template to avoid biasing the model toward a single protein. The template used in the next step of homology modeling will then be an averaged structure based on all of the chosen templates.
  • 15. Steps in homology modeling for tertiary structure prediction: 2. Sequence alignment– Construct a multiple sequence alignment of the target, the template, and other homologous sequences. It is actually the alignment of the target and template that is of interest, but the inclusion of other homologs provides more information, helping to ensure that the best alignment of homologous residues is achieved. The quality of the target- template alignment is critical for constructing an accurate structural model for the target. If a given residue in the target is not aligned with the proper residue in the template, the error cannot be corrected in later steps of model building. A robust multiple sequence alignment program should be used for this step, and the resulting alignment should be very carefully examined and manually refined if necessary.
  • 16. Steps in homology modeling for tertiary structure prediction: 3. Backbone model building– Residues in the aligned regions of the target and template are assumed to adopt the same structure. Therefore, the backbone atoms of these residues in the target can be placed in the same 3D location as the backbone atoms of these residues in the template. See the alignment below as an example. Target: ...FKSQAAIHEAYCNFHYKVTAAASRTPEIDFDVHFSSIF... Template: ...FKQQANIHCAYCNGAYKIG-------GKELQVHFSWLF... For these residues, backbone atoms of the target are assumed to occupy the same 3D location as those of the template. F aligned with F. They are identical, so all atoms of target F will overlap the 3D positions of all atoms of template F. E aligned with D. They are not identical, but their backbone atoms can be assumed to occupy the same 3D position. So backbone atoms of target D will overlap the 3D positions of backbone atoms of template E.
  • 17. Steps in homology modeling for tertiary structure prediction: 4. Loop building– There are likely to be regions in the alignment where gaps appear because the target sequence does not match the template. The target sequence residues in these gap regions are assumed to form a loop that is not present in the template structure. The structure of this loop can be built using several different methods. In any case, it is a difficult problem since the template provides no information to guide the building of the loop structure. Target: ...FKSQAAIHEAYCNFHYKVTAAASRTPEIDFDVHFSSIF... Template: ...FKQQANIHCAYCNGAYKIG-------GKELQVHFSWLF... “Extra” residues in the target sequence do not match the template and are assumed to form a loop. target loop
  • 18. Steps in homology modeling for tertiary structure prediction: 5. Side chain addition– The side chains are added to the backbone structure. Each side chain could potentially have many possible conformations due to bond rotation, but steric clashes with neighboring atoms are not allowed. Therefore, side chain that have the lowest interaction energy with nearby atoms are chosen. Target: ...FKSQAAIHEAYCNFHYKVTAAASRTPEIDFDVHFSSIF... Template: ...FKQQANIHCAYCNGAYKIG-------GKELQVHFSWLF... Target and template are both F, so all atoms of the target side chain can be modeled as having the same 3D positions as the template side chain, at least initially. (Small changes in position may be necessary in later refinement steps.) Target and template have different side chains (D vs. E), so the side chain rotamer that is chosen for the target D must not overlap/clash with any neighboring atoms.
  • 19. Steps in homology modeling for tertiary structure prediction: 6. Model refinement– Unfavorable bond angles, bond lengths, and atom contacts are likely to exist in the preliminary model, so an energy minimization procedure is applied to refine the model. In this procedure, atom positions are shifted so that the overall conformation of the entire structure has the lowest energy potential. Only limited energy minimization should be applied (a few hundred iterations) so that major errors are removed but residues are not moved from their correct positions. 7. Model evaluation– The model is checked for anomalies in dihedral angles, bond lengths, and atom contacts.
  • 20. Programs for homology modeling: Many programs for automated homology modeling are now available, so anyone can construct a homology model on a regular PC. However, construction of a “good” homology model (at least for sequences that are not highly similar) usually requires some expertise and usually should be done with human intervention, rather than in a fully automated fashion. A few of the freely available programs for homology modeling: SWISS-MODEL– Produces accurate models; fast; good tutorials available. http://swissmodel.expasy.org/ I-TASSER– Produces accurate models; easy to use, but slow http://zhanglab.ccmb.med.umich.edu/I-TASSER/ Modeller– must be downloaded and installed locally http://salilab.org/modeller/modeller.html WHAT IF http://swift.cmbi.ru.nl/servers/html/index.html http://swift.cmbi.ru.nl/whatif/
  • 21. Is a homology model CORRECT? Since the actual (experimentally determined) structure of the target is not known, there is no way to say whether or not the homology model is “correct.” Instead, the best a researcher can do is compare the homology model to the structure of the template from which it was derived. If the atom positions in the model do not deviate very much from those of the template, the homology model is said to be “accurate.” The greater the deviation between model and template, the lower the accuracy of the model. When is a homology model definitely INCORRECT? A homology model has regions that are incorrect if it contains structural features that do not occur in native proteins, such as: • Hydrophobic side chains on the surface of the model (these side chains should be buried) • Unreasonable bond lengths or angles • Unfavorable noncovalent contacts between atoms (clashes) • Unreasonable dihedral angles
  • 22. Accuracy of homology modeling: The template selection and alignment accuracy are crucial to the accuracy of a homology model. The accuracy of the model depends on the percentage of sequence identity between the target and template. The average coordinate agreement between the modeled structure and the actual structure drops ~0.3 Å for each 10% reduction in sequence identity. The largest structural differences between homologous proteins are in surface loops. In other words, the structure of the protein core is more highly conserved. Therefore, the regions that are most likely to be in error in a homology model are the surface loops. High-accuracy homology models can be built when the target and template have 50% or greater sequence identity. Errors are mostly mistakes in side-chain packing, small shifts of the core backbone regions, and occasionally larger errors in loops. Medium-accuracy homology models can be built when the proteins share 30-50% sequence identity. There can be alignment mistakes, and there are more frequent side- chain packing, core distortion, and loop modeling errors. Low-accuracy homology models are based on proteins that share <30% sequence identity. If a model is based on an almost insignificant alignment to a known structure, the model may have an entirely incorrect fold. The best model-building programs will produce models of similar accuracy, provided that the methods are used optimally.