SlideShare a Scribd company logo
1 of 45
Download to read offline
Computational Protein Design
                         2. Computational Protein Design Techniques


                                           Pablo Carbonell
                           pablo.carbonell@issb.genopole.fr

                               iSSB, Institute of Systems and Synthetic Biology
                              Genopole, University d’Évry-Val d’Essonne, France



                                     mSSB: December 2010




Pablo Carbonell (iSSB)                    Computational Protein Design            mSSB: December 2010   1 / 45
Outline



1   Introduction

2   Computational Protein Descriptors

3   Sequence-based CPD

4   Structure-based CPD

5   Search Algorithms in CPD

6   De Novo Design

7   Challenges in Sequence and Structure-Based CPD




       Pablo Carbonell (iSSB)     Computational Protein Design   mSSB: December 2010   2 / 45
Outline



1   Introduction

2   Computational Protein Descriptors

3   Sequence-based CPD

4   Structure-based CPD

5   Search Algorithms in CPD

6   De Novo Design

7   Challenges in Sequence and Structure-Based CPD




       Pablo Carbonell (iSSB)     Computational Protein Design   mSSB: December 2010   3 / 45
Computational Protein Design




    Pablo Carbonell (iSSB)   Computational Protein Design   mSSB: December 2010   4 / 45
A Blueprint of CPD Approaches




∗ RS : research studies
           Pablo Carbonell (iSSB)   Computational Protein Design   mSSB: December 2010   5 / 45
Outline



1   Introduction

2   Computational Protein Descriptors

3   Sequence-based CPD

4   Structure-based CPD

5   Search Algorithms in CPD

6   De Novo Design

7   Challenges in Sequence and Structure-Based CPD




       Pablo Carbonell (iSSB)     Computational Protein Design   mSSB: December 2010   6 / 45
Molecular Signature Descriptors



   A 2D representation of the molecular graphs                     Atomic signature :
   as an undirected colored graphs G(V , E, C),
                                                                                Xh
   with V : atoms, E : bonds, C : atom type                            h
                                                                         σ(G) =      σ(x)               (1)
   The signature descriptor of height h of atom x                                      x∈V
   in the molecular graph G, or h σ(x), is a
                                                                   The signature is a systematic
   canonical representation of the subgraph of
                                                                   codification of the molecular
   G containing all atoms that are at distance h
                                                                   graph [Faulon et al., 2004]
   from x


                                            σ(methylcyclopropane) =
                                            1   [C]([H][C]([H][H][C,0])[C,0]([H][H])[C]([H][H][H]))
                                            2   [C]([H][H][C]([H][C,0][C]([H][H][H]))[C,0]([H][H]))
                                            1   [C]([H][H][H][C]([H][C]([H][H][C,0])[C,0]([H][H])))
                                            1   [H]([C]([C]([H][H][C,0])[C,0]([H][H])[C]([H][H][H])))
                                            4   [H]([C]([H][C]([H][C,0][C]([H][H][H]))[C,0]([H][H])))
                                            3   [H]([C]([H][H][C]([H][C]([H][H][C,0])[C,0]([H][H]))))




     Pablo Carbonell (iSSB)       Computational Protein Design                    mSSB: December 2010   7 / 45
Molecular Signature of Reactions and Proteins


   Signature of a reaction. The signature of reaction R

                              S1 + S2 + . . . + Sn        →       P1 + P2 + . . . + Pn                         (2)

   that transforms n substrates into m products is given by the difference between the
   signature of the products and the signature of the substrates:
                           h
                                        Xh           Xh
                             σ(R) =          σ(p) −       σ(s)                      (3)
                                                     p∈P                  s∈S

   Signature of protein sequences. The protein P is represented by the linear
   chain given by its collapsed graph at residue level, a reduced molecular graph
   representation G(V , E, C) known as string signature where V : residues a ∈ A,
   E : contiguous in sequence, C : amino acid type

                                          h
                                                               Xh
                                              σ(P)     =                 σ(a)                                  (4)
                                                               a∈A




     Pablo Carbonell (iSSB)               Computational Protein Design                   mSSB: December 2010   8 / 45
Protein Contact Maps




   The protein contact map is a graph
   representation of the 3D interactions
   at residue level G(V , E, C) where V :
   residues, E : contacts, C : amino acid
   type
   Two residues are considered to
   interact when atoms between both
   residues are at a distance lower than a
   predetermined threshold (tipically
   4.5 ∼ 5 Å)
   Contact maps can account for
   long-range interactions and
   conformational states

                                                 Song et al. [2010]




     Pablo Carbonell (iSSB)       Computational Protein Design        mSSB: December 2010   9 / 45
Outline



1   Introduction

2   Computational Protein Descriptors

3   Sequence-based CPD

4   Structure-based CPD

5   Search Algorithms in CPD

6   De Novo Design

7   Challenges in Sequence and Structure-Based CPD




       Pablo Carbonell (iSSB)     Computational Protein Design   mSSB: December 2010   10 / 45
Sequence and Structure-Based CPD




   Sequence-based CPD methods are in some cases a good trade-off between
   complexity of the model and accuracy of the predictions



    Pablo Carbonell (iSSB)    Computational Protein Design   mSSB: December 2010   11 / 45
Sequence-based Knowledge-based potentials



   The simplest way to score a protein and to identify active regions is through amino
   acid scales or indexes
   AAindex is a database of
        544 amino acid indexes
        94 Amino Acid Matrices
        47 amino acid pair-wise contact potentials

  Examples: hydrophobicity,
  accessibility, van der Waals volume,
  secondary structure propensity,
  flexibility
  This approach is widely used when
  analyzing conserved motifs and
  correlated mutations in protein fold
  families through multiple alignments




    Pablo Carbonell (iSSB)          Computational Protein Design   mSSB: December 2010   12 / 45
Quantitative Structure-Activity Relationship (QSAR) Techniques

                                               The goal is to model causal relationships
   QSAR is a statistical method used
                                               between
   extensively by the chemical and
   pharmaceutical industries in                        structures of interacting molecules
   small-molecules and peptide                         measurables properties of scientific
   optimization                                        or commercial interest such as
                                                       ADME/Tox (absorption, distribution,
                                                       metabolism, excretion, and toxicity) of
                                                       drugs




     Pablo Carbonell (iSSB)     Computational Protein Design              mSSB: December 2010   13 / 45
QSAR Model Evaluation




   Model predictability is generally evaluated through the leave-one-out (LOO)
   cross-validation correlation coefficient q 2
   Partial least-squares (PLS) regression is commonly used
   Additional nonlinear terms can be added through the use of nonlinear regression
   or machine learning techniques (kernel methods, random forests, etc)




    Pablo Carbonell (iSSB)       Computational Protein Design     mSSB: December 2010   14 / 45
QSAR Modeling Workflow




    Pablo Carbonell (iSSB)   Computational Protein Design   mSSB: December 2010   15 / 45
Pablo Carbonell (iSSB)   Computational Protein Design   mSSB: December 2010   16 / 45
Pablo Carbonell (iSSB)   Computational Protein Design   mSSB: December 2010   17 / 45
The ProSAR Algorithm




   An extension of SAR-based approaches to CPD
   It formalizes the decision-making processes about which mutations to include in
   combinatorial libraries
                                                    N
                                                    XX
                                      y      =                  cij xij                          (5)
                                                     i=1 j∈A


        y : the predicted function (activity) of the protein sequence
        cij : the regression coefficients corresponding to the mutational effect of having residue
        j among the 20 amino acids A at postion i
        xij : binary variable indicating the presence or absence of residue j at position i




    Pablo Carbonell (iSSB)           Computational Protein Design          mSSB: December 2010   18 / 45
Improving Catalytic Function by ProSAR-driven Enzyme Evolution




                                                     Statistical analysis of protein sequence
                                                               activity relationships




                                                       Bacterial biocatalysis of
                                                        Atorvastatin (Lipitor)
                                                     (cholesterol-lowering drug)
                   Codexis Inc.


     Pablo Carbonell (iSSB)       Computational Protein Design               mSSB: December 2010   19 / 45
Outline



1   Introduction

2   Computational Protein Descriptors

3   Sequence-based CPD

4   Structure-based CPD

5   Search Algorithms in CPD

6   De Novo Design

7   Challenges in Sequence and Structure-Based CPD




       Pablo Carbonell (iSSB)     Computational Protein Design   mSSB: December 2010   20 / 45
Structure-based CPD




  Energy functions and molecular force fields
  Local conformational restrictions
  Predicting entropic factors
  Protein topological properties




                                                                  From Narasimhan et al. [2010]




    Pablo Carbonell (iSSB)         Computational Protein Design                            mSSB: December 2010   21 / 45
Energy Functions and Molecular Force Fields




   In structure-based CPD, folds are usually
   represented by the spatial coordinates of the
   backbone atoms or design scaffold
   Protein design is done by amino acid side
   chains along the scaffold

   Side chains are only permitted to assume a
   discrete set of statistically preferred
   conformations: rotamers
   Rotamer/backbone and rotamer/rotamer
   interaction energies are tabulated
   These potential energies can then be
   approximated by using any of the standard
   force fields : CHARMM, AMBER, GROMOS



     Pablo Carbonell (iSSB)       Computational Protein Design   mSSB: December 2010   22 / 45
Molecular Force Fields

AMBER: a classical force field for energy and MD calculations:

                         X 1                       X 1                      X 1
     V (r N )     =              kb (l − l0 )2 +          ka (θ − θ0 )2 +             Vn [1 + cos(nω − γ)]
                               2                        2                           2
                         bonds                   angles                   torsions
                            N−1 X
                                       ( "„ «                  „ «6 #                  )
                            X N                  r0ij
                                                       12
                                                                 r0ij           qi qj
                         +                i,j             −2             +                                 (6)
                                                  rij             rij        4π 0 rij
                             j=1 i=j+1

      P
 1                (·): energy between covalently bonded atoms.
      Pbonds
        angles (·): energy due to the geometry of electron orbitals involved in covalent
 2

      bonding.
      P
        torsions (·): energy for twisting a bond due to bond order (e.g. double bonds) and
 3

      neighboring bonds or lone pairs of electrons.
      PN−1 PN
                  i=j+1 (·): non-bonded energy between all atom pairs:
 4
        j=1
         1      van der Waals energies
         2      Electrostatic energies



         Pablo Carbonell (iSSB)                Computational Protein Design            mSSB: December 2010   23 / 45
Structure-based Knowledge-based Potentials

       They are built by performing a large-scale statistical study of structural databases
       such as PDB (Protein Data Bank)
               Rotamer libraries (∼ 150 rotameric states)
               Binary patterning: only some type of amino acids are allowed based on the
               hydrophobic environment
               An implicit solvation model
               Secondary structure propensity
               Frequency of small segments in the PDB
               Pairwise potentials
               van der Waals interactions
               Hydrogen bonding
               Electrostatics
               Entropy-based penalties for flexible side-chains




From Boas and Harbury [2007]

          Pablo Carbonell (iSSB)          Computational Protein Design       mSSB: December 2010   24 / 45
Energy Functions



   Design along the backbone or scaffold
   Rotamer/backbone and rotamer/rotamer interact. energies tabulated
   Precomputed from molecular force fields : CHARMM, AMBER, GROMOS

Total energy of the protein
                                       X                   X
                              ETOT =         Ek (rk ) +            Ekl (rk , rl )                         (7)
                                         k                  k =l


   N : length of the protein
   rk : the rotamer of the kth side chain
   Ek (rk ) : the self-energy of a particular rotamer rk
   Ekl (rk , rl ) : the pair energy of rotamers rk , rj




     Pablo Carbonell (iSSB)            Computational Protein Design                 mSSB: December 2010   25 / 45
The Role of Dynamics

   Besides protein structure, protein dynamics can play a direct role in molecular
   recognition
   Flexible proteins recognize their targets through induced fit or conformational
   selection, likely showing promiscuity
   Binding is commonly enthalpy-driven, but in some cases entropy is important, for
   instance:
          Proteins with multiple binding sites
          Small hydrophobic molecules
   Two types of source of protein motions:
          Protein flexibility: intraconformational dynamics (fast time scale motions)
          Conformational heterogeneity: interconformational dynamics

  Gibbs free energy:

               ∆G         =    ∆H − T ∆S                                        (8)
                ∆S        =    ∆Ssolv + ∆Sconf + ∆Srt                           (9)
  ∆Sconf : conformational entropy of protein and ligand

  ∆Srtf : rotational and translational degree of freedoms


     Pablo Carbonell (iSSB)                      Computational Protein Design         mSSB: December 2010   26 / 45
Predicting Side-chain Dynamics from Structural Descriptors




   The Lipari-Szabo model free approach approach allows to quantify motions from
   NMR experiments by computing the generalized order parameter S 2
         Protein backbone dynamics : 15 NH and 13 Cα H NMR relaxation methods
         Protein side chain methyl dynamics : 13 Cα H NMR relaxation methods (side-chain
         motions in the picosecond-to-nanosecond time regime)
   From the BMRB we compiled S 2 data for 18 proteins, including 10 proteins in 2 or
   more different states : calmodulin, barnase, pdz, mup, dfhr, staphylococcal
   nuclease, pin1, sh3 domain, MSG
   This technique provides only measurements for the Cα of methyl groups in side
   chains : ALA, LEU, ILE, MET, THR, VAL




     Pablo Carbonell (iSSB)         Computational Protein Design         mSSB: December 2010   27 / 45
Structural Descriptors of Methyl Dynamics



   We consider the following parameters influencing side-chain dynamics :
         Packing density at the methyl site i and its neighboring residues j within a sphere of
         r =5Å
                                                           0             1
                                    X                 X B X
                         Pi =             Cj e−rij =               e−rjk A e−rij             (10)
                                                                         C
                                                           @
                                     rij <5Å               rij <5Å   rjk <5Å

         Side chain stiffness : number of dihedral angles separating the backbone from the
         methyl carbon. weighted by the side-chain packing
         Rotameric state : angular distance ∆χ = χ − χ0 to the closest rotameric state χ0 in
         the library
         Elongation : distance from the methyl site to the Cα
         Pairwise contact potential : a knowledge-based potential of frequence of contacts
         between residues at several distances computed from the PDB
         Solvation effect : DSSP accessibility and residue hydrophobicity
         Van der Waals contacts
         Hydrogen bonds (in the case of Threonine)




     Pablo Carbonell (iSSB)           Computational Protein Design             mSSB: December 2010   28 / 45
Predicting Methyl Side-chain Dynamics
Algorithm : neural network
Cross-validation : r = 0.71 ± 0.029                                  Example : experimental and predicted
(p-value = 4.6 × 10−87 )                                             changes in ∆S 2 of barnase after binding
                                                                     barstar




           Protein        MD method   r (MD)   r (nnet)

           ubiquitin      AMBER99SB   0.81        0.81
           TNfn3          CHARMM 22   0.62        0.79                          ∆S 2 > 0                ∆S 2 < 0
           FNfn10         CHARMM 22   0.51        0.64                         rigidification          flexibilization
           barnase        OPLS-AA/L   0.55        0.64
           calmodulin     FDPB        0.60        0.72


[Carbonell and del Sol, 2009]

           Pablo Carbonell (iSSB)                     Computational Protein Design             mSSB: December 2010     29 / 45
Outline



1   Introduction

2   Computational Protein Descriptors

3   Sequence-based CPD

4   Structure-based CPD

5   Search Algorithms in CPD

6   De Novo Design

7   Challenges in Sequence and Structure-Based CPD




       Pablo Carbonell (iSSB)     Computational Protein Design   mSSB: December 2010   30 / 45
Search Algorithms in CPD




    Pablo Carbonell (iSSB)   Computational Protein Design   mSSB: December 2010   31 / 45
Search Algorithms



   Objective: finding the best design within the space of all possible amino
   acid/rotameric states
   A vast search space: 20N or pN
        N: number of positions to mutate
        p: number of rotameric states
   Strategies
        Deterministic algorithms
                Dead-end elimination (DEE) algorithm: a pruning method.
                Some accelerations of the DEE algorithm: upper-bound estimation; the “magic bullet” metric;
                conformational splitting; background optimization
        Stochastic algorithms
                Monte Carlo
                Simulated annealing
                Genetic algorithms




    Pablo Carbonell (iSSB)                Computational Protein Design              mSSB: December 2010   32 / 45
The DEE Algorithm



   It assumes that the energy of the protein can be written as
                                     X             X
                         ETOT =         Ek (rk ) +   Ekl (rk , rl )                                (11)
                                               k                      k =l

   N : length of the protein
   rk : the rotamer of the kth side chain
   Ek (rk ):" the self-energy of a particular rotamer rk
   Ekl (rk , rl ): the pair energy of the rotamers rk , rj
   Complexity:
         Single search scales quadratically with total number of rotamers O((p × N)2 )
         Pair search scales cubically O((p × N)3 )
         Brute force enumeration : O(pN )




     Pablo Carbonell (iSSB)            Computational Protein Design          mSSB: December 2010    33 / 45
The DEE Algorithm


   Single rotamers and rotamer pairs are eliminated during the computational cycles
        Single elimination : eliminate rotamer if some other rotamer in the side chain gives
        better energy
                                      N
                                      X                                                   N
                                                                                          X
                              A
                         Ek (rk ) +         min Ekl (rk , rlX )
                                                      A
                                                                      >           B
                                                                             Ek (rk ) +         max Ekl (rk , rlX )
                                                                                                          B
                                                                                                                            (12)
                                             X                                                   X
                                      l=1                                                 l=1

        Pairs elimination : eliminate pair of rotamers in two positions if there exists another
        pair that gives better energy
                                                    def
                                            Ukl = Ek (rk ) + El (rlB ) + Ekl (rk , rlB )
                                             AB        A                       A
                                                                                                                            (13)

                                             N
                                             X         “                                  ”
                                  AB
                                 Ukl +              min Eki (rk , riX ) + Elj (rlB , rjX ) >
                                                              A
                                                       X
                                              i=1
                                                 N
                                                 X        “                                   ”
                                   CD
                                  Ukl +                max Eki (rk , riX ) + Elj (rlD , rjX )
                                                                 C
                                                                                                                            (14)
                                                           X
                                                 i=1

   Values are precomputed and stored in energy matrices



    Pablo Carbonell (iSSB)                        Computational Protein Design                        mSSB: December 2010   34 / 45
Stochastic Algorithms

   Search in the space of feasible designs by making a series of combinations of
   random and directed moves
   Monte Carlo Metropolis: a move consists of exchanging one rotamer for another
   at a randomly chosen position, a modification is accepted if it lowers the energy
   Simulated Annealing allows to explore nearby solutions at the initial cycles of the
   search
   Genetic Algorithms: a population of models is propagated (evolved) throughout
   the course of the run and genetic operators, such as recombination, are used to
   create new models from existing parents
   They are fast, can be scaled up to problems of large complexity
   They are not guaranteed to converge to the optimal solution




     Pablo Carbonell (iSSB)      Computational Protein Design      mSSB: December 2010   35 / 45
The SCHEMA Algorithm




  Equivalent to an in silico directed evolution
  Consists of scoring libraries of hybrid protein
  sequences against the parental sequence
  Scoring:
       Calculate the number of interactions between residues
       (contacts within 4.5 Å) that are disrupted in the creation
       of hybrid proteins
       Hybrids are scored for stability by counting the number of
       disruptions
       Protein is partitioned into blocks that should not
                                                                    From [Meyer et al., 2006]
       interrupted by crossovers (analog to genetic algorithms)




    Pablo Carbonell (iSSB)          Computational Protein Design               mSSB: December 2010   36 / 45
The OPTCOM and IPRO Algorithms for Library Design

       The OPTCOM algorithm:             The IPRO algorithm:
               Balances size and                 Identify point mutations in the parent sequences
               quality of the library            using energy-based scoring fuctions
                                                 Residue and rotamer choices are driven by a
                                                 mixed-integer linear programming formulation
                                                 (MILP)




From [Saraf et al., 2006]


           Pablo Carbonell (iSSB)       Computational Protein Design          mSSB: December 2010   37 / 45
Some Web Resources


   IPRO: Iterative Protein Redesign and Optimization.
   http://maranas.che.psu.edu/IPRO.htm
   EGAD: A Genetic Algorithm for protein Design.
   http://egad.ucsd.edu/software.php
   RosettaDesign: A software package.
   http://rosettadesign.med.unc.edu/
   SCHEMA A pair-wise energy function for scoring protein chimeras made from
   homologous proteins. http://www.che.caltech.edu/groups/fha/
   schema-tools/schema-overview.html
   SHARPEN: Systematic Hierarchical Algorithms for Rotamers and Proteins on
   an Extended Network.
   http://koko.che.caltech.edu/sharpenabout.html
   WHAT IF: Software for protein modelling, design, validation, and
   visualisation. http://swift.cmbi.ru.nl/whatif/
   FoldX: A force field for energy calculations and protein design.
   http://foldx.crg.es/


    Pablo Carbonell (iSSB)     Computational Protein Design   mSSB: December 2010   38 / 45
Outline



1   Introduction

2   Computational Protein Descriptors

3   Sequence-based CPD

4   Structure-based CPD

5   Search Algorithms in CPD

6   De Novo Design

7   Challenges in Sequence and Structure-Based CPD




       Pablo Carbonell (iSSB)     Computational Protein Design   mSSB: December 2010   39 / 45
De Novo-Designed Proteins



   In de novo designs, some assumptions are needed in order to make the search
   space tractable
   Usually we start from some basic motifs or domains as scaffolds for the design
   Examples:
        βαβ motif resembling a zinc finger
        3 and 4 helix bundles
        Helical coiled-coils
   Helix bundle motifs can be parametrized using a few global variables that
   describe the global structure
   Applications:
        New metal-binding sites
        Nonbiological cofactors for novel biomaterials and electromechanical devices
        Novel enzymatic activities




    Pablo Carbonell (iSSB)          Computational Protein Design         mSSB: December 2010   40 / 45
Example: De Novo Design of a Metalloprotein




   Computational de novo design of a four-helix (108 residues) bundle containing the
   non-biological cofactor iron diphenyl porphyrin (DPP-Fe) [Bender et al., 2007]
         The initial helix bundle was selected as low-energy structure computed with MCSA
         STITCH: a program to select loops connecting helices from PDB Select
         CHARMM and PROCHECK for removing overlaps
         4 His and the 4 Thr residues to support the 6-point coordination of the Fe(III) cations
         SCADS: provides side-dependent amino acid probabilities in each round



     Pablo Carbonell (iSSB)           Computational Protein Design         mSSB: December 2010   41 / 45
Outline



1   Introduction

2   Computational Protein Descriptors

3   Sequence-based CPD

4   Structure-based CPD

5   Search Algorithms in CPD

6   De Novo Design

7   Challenges in Sequence and Structure-Based CPD




       Pablo Carbonell (iSSB)     Computational Protein Design   mSSB: December 2010   42 / 45
Challenges in Sequence and Structure-Based CPD



Modeling
    Greater availability of 3D protein structural information
    More accurate energy functions
    Improvement of rigid and flexible docking


Design
    Improvement in search algorithms
    Parametrization for non-natural amino acids

Prediction
    Beyond additive models: using machine-learning algorithms
    More complete environment descriptors




     Pablo Carbonell (iSSB)       Computational Protein Design   mSSB: December 2010   43 / 45
Computational Protein Design
                         2. Computational Protein Design Techniques


                                           Pablo Carbonell
                           pablo.carbonell@issb.genopole.fr

                               iSSB, Institute of Systems and Synthetic Biology
                              Genopole, University d’Évry-Val d’Essonne, France



                                     mSSB: December 2010




Pablo Carbonell (iSSB)                    Computational Protein Design            mSSB: December 2010   44 / 45
Bibliography I



Gretchen M. Bender, Andreas Lehmann, Hongling Zou, Hong Cheng, H. Christopher Fry, Don Engel, Michael J. Therien, J. Kent Blasie, Heinrich Roder,
    Jeffrey G. Saven, and William F. DeGrado. De Novo Design of a Single-Chain Diphenylporphyrin Metalloprotein. Journal of the American Chemical
    Society, 129(35):10732–10740, September 2007. ISSN 0002-7863. doi: 10.1021/ja071199j. URL http://dx.doi.org/10.1021/ja071199j.
F. Edward Boas and Pehr B. Harbury. Potential energy functions for protein design. Current opinion in structural biology, 17(2):199–204, April 2007. ISSN
    0959-440X. doi: 10.1016/j.sbi.2007.03.006. URL http://dx.doi.org/10.1016/j.sbi.2007.03.006.
Pablo Carbonell and Antonio del Sol. Methyl side-chain dynamics prediction based on protein structure. Bioinformatics, pages btp463+, July 2009. doi:
    10.1093/bioinformatics/btp463. URL http://dx.doi.org/10.1093/bioinformatics/btp463.
Jean-Loup L. Faulon, Michael J. Collins, and Robert D. Carr. The signature molecular descriptor. 4. Canonizing molecules using extended valence
   sequences. Journal of chemical information and computer sciences, 44(2):427–436, 2004. ISSN 0095-2338. doi: 10.1021/ci0341823. URL
   http://dx.doi.org/10.1021/ci0341823.
Michelle M. Meyer, Lisa Hochrein, and Frances H. Arnold. Structure-guided SCHEMA recombination of distantly related β-lactamases. Protein Engineering
    Design and Selection, 19(12):563–570, December 2006. ISSN 1741-0126. doi: 10.1093/protein/gzl045. URL
    http://dx.doi.org/10.1093/protein/gzl045.
Diwahar Narasimhan, Mark R. Nance, Daquan Gao, Mei-Chuan Ko, Joanne Macdonald, Patricia Tamburi, Dan Yoon, Donald M. Landry, James H. Woods,
   Chang-Guo Zhan, John J. G. Tesmer, and Roger K. Sunahara. Structural analysis of thermostabilizing mutations of cocaine esterase. Protein
   Engineering Design and Selection, 23(7):537–547, July 2010. doi: 10.1093/protein/gzq025. URL http://dx.doi.org/10.1093/protein/gzq025.
Manish C. Saraf, Gregory L. Moore, Nina M. Goodey, Vania Y. Cao, Stephen J. Benkovic, and Costas D. Maranas. IPRO: an iterative computational protein
   library redesign and optimization procedure. Biophysical journal, 90(11):4167–4180, June 2006. ISSN 0006-3495. doi: 10.1529/biophysj.105.079277. URL
   http://dx.doi.org/10.1529/biophysj.105.079277.
Jiangning Song, Kazuhiro Takemoto, Hongbin Shen, Hao Tan, Michael M. Gromiha, and Tatsuya Akutsu. Prediction of Protein Folding Rates from Structural
    Topology and Complex Network Properties. IPSJ Transactions on Bioinformatics, 3:40–53, 2010. doi: 10.2197/ipsjtbio.3.40. URL
    http://dx.doi.org/10.2197/ipsjtbio.3.40.




           Pablo Carbonell (iSSB)                             Computational Protein Design                                mSSB: December 2010           45 / 45

More Related Content

What's hot

Drug development approaches
Drug development approaches Drug development approaches
Drug development approaches VIOLINA KALITA
 
Protein array, types and application
Protein array, types and applicationProtein array, types and application
Protein array, types and applicationKAUSHAL SAHU
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure predictionkaramveer prajapat
 
Basics Of Molecular Docking
Basics Of Molecular DockingBasics Of Molecular Docking
Basics Of Molecular DockingSatarupa Deb
 
Protein engineering and its techniques himanshu
Protein engineering and its techniques himanshuProtein engineering and its techniques himanshu
Protein engineering and its techniques himanshuhimanshu kamboj
 
METHODS TO DETERMINE PROTEIN STRUCTURE
METHODS TO DETERMINE PROTEIN STRUCTURE METHODS TO DETERMINE PROTEIN STRUCTURE
METHODS TO DETERMINE PROTEIN STRUCTURE Sabahat Ali
 
Role of genomics and proteomics
Role of genomics and proteomicsRole of genomics and proteomics
Role of genomics and proteomicsPavana K A
 
Chou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure predictionChou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure predictionRoshan Karunarathna
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure predictionSamvartika Majumdar
 
Structure Based Drug Design
Structure Based Drug DesignStructure Based Drug Design
Structure Based Drug Designnmicaelo
 
Mass Spectrometry Based Proteomics Analysis
Mass Spectrometry Based Proteomics AnalysisMass Spectrometry Based Proteomics Analysis
Mass Spectrometry Based Proteomics AnalysisSijo A
 
Techniques in proteomics
Techniques in proteomicsTechniques in proteomics
Techniques in proteomicsN Poorin
 
Molecular docking
Molecular dockingMolecular docking
Molecular dockingpalliyath91
 
Alignment of pairs of sequence (Types of Similarity Sequences)
Alignment of pairs of sequence (Types of Similarity Sequences)Alignment of pairs of sequence (Types of Similarity Sequences)
Alignment of pairs of sequence (Types of Similarity Sequences)Rahul M. Prathap
 
Introduction to proteomics
Introduction to proteomicsIntroduction to proteomics
Introduction to proteomicsShryli Shreekar
 

What's hot (20)

Drug development approaches
Drug development approaches Drug development approaches
Drug development approaches
 
demonstration lecture on Homology modeling
demonstration lecture on Homology modelingdemonstration lecture on Homology modeling
demonstration lecture on Homology modeling
 
Protein array, types and application
Protein array, types and applicationProtein array, types and application
Protein array, types and application
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
 
Basics Of Molecular Docking
Basics Of Molecular DockingBasics Of Molecular Docking
Basics Of Molecular Docking
 
Protein engineering and its techniques himanshu
Protein engineering and its techniques himanshuProtein engineering and its techniques himanshu
Protein engineering and its techniques himanshu
 
METHODS TO DETERMINE PROTEIN STRUCTURE
METHODS TO DETERMINE PROTEIN STRUCTURE METHODS TO DETERMINE PROTEIN STRUCTURE
METHODS TO DETERMINE PROTEIN STRUCTURE
 
Role of genomics and proteomics
Role of genomics and proteomicsRole of genomics and proteomics
Role of genomics and proteomics
 
PROTEIN MICROARRAYS
PROTEIN MICROARRAYSPROTEIN MICROARRAYS
PROTEIN MICROARRAYS
 
Chou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure predictionChou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure prediction
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
 
Proteomics
ProteomicsProteomics
Proteomics
 
Structure Based Drug Design
Structure Based Drug DesignStructure Based Drug Design
Structure Based Drug Design
 
Mass Spectrometry Based Proteomics Analysis
Mass Spectrometry Based Proteomics AnalysisMass Spectrometry Based Proteomics Analysis
Mass Spectrometry Based Proteomics Analysis
 
Techniques in proteomics
Techniques in proteomicsTechniques in proteomics
Techniques in proteomics
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Alignment of pairs of sequence (Types of Similarity Sequences)
Alignment of pairs of sequence (Types of Similarity Sequences)Alignment of pairs of sequence (Types of Similarity Sequences)
Alignment of pairs of sequence (Types of Similarity Sequences)
 
Brief Introduction of SILAC
Brief Introduction of SILACBrief Introduction of SILAC
Brief Introduction of SILAC
 
Protein modeling
Protein modelingProtein modeling
Protein modeling
 
Introduction to proteomics
Introduction to proteomicsIntroduction to proteomics
Introduction to proteomics
 

Viewers also liked

Computational Protein Design. 4. A Practical Exercise
Computational Protein Design. 4. A Practical ExerciseComputational Protein Design. 4. A Practical Exercise
Computational Protein Design. 4. A Practical ExercisePablo Carbonell
 
Computational Protein Design. 3. Applications in Systems and Synthetic Biology
Computational Protein Design. 3. Applications in Systems and Synthetic BiologyComputational Protein Design. 3. Applications in Systems and Synthetic Biology
Computational Protein Design. 3. Applications in Systems and Synthetic BiologyPablo Carbonell
 
Protein networks as a scaffold for structuring other data
Protein networks as a scaffold for structuring other dataProtein networks as a scaffold for structuring other data
Protein networks as a scaffold for structuring other dataLars Juhl Jensen
 
Cryotherapy for Pathogen Free Planting Material in Ornamental Crops
Cryotherapy for Pathogen Free Planting Material in Ornamental Crops Cryotherapy for Pathogen Free Planting Material in Ornamental Crops
Cryotherapy for Pathogen Free Planting Material in Ornamental Crops Abhay Kumar Gaurav
 
Plegable Proteínas - Biología Molecular
Plegable Proteínas - Biología MolecularPlegable Proteínas - Biología Molecular
Plegable Proteínas - Biología Molecularsyepesa95
 
Antigen processing lecture-nkn
Antigen processing  lecture-nknAntigen processing  lecture-nkn
Antigen processing lecture-nknNavreet Nanda
 
Computational Protein Design. Overview
Computational Protein Design. OverviewComputational Protein Design. Overview
Computational Protein Design. OverviewPablo Carbonell
 
Effect of The Different Auxins and Cytokinins in Callus Induction, Shoot, and...
Effect of The Different Auxins and Cytokinins in Callus Induction, Shoot,and...Effect of The Different Auxins and Cytokinins in Callus Induction, Shoot,and...
Effect of The Different Auxins and Cytokinins in Callus Induction, Shoot, and...Arghya Narendra Dianastya
 
Computational Protein Design. 1. Challenges in Protein Engineering
Computational Protein Design. 1. Challenges in Protein EngineeringComputational Protein Design. 1. Challenges in Protein Engineering
Computational Protein Design. 1. Challenges in Protein EngineeringPablo Carbonell
 
Developing an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
Developing an Efficient Infrastruture, Standards and Data-Flow for MetabolomicsDeveloping an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
Developing an Efficient Infrastruture, Standards and Data-Flow for MetabolomicsChristoph Steinbeck
 
Plant tissue culture march 2
Plant tissue culture march 2Plant tissue culture march 2
Plant tissue culture march 2Dr. sreeremya S
 
Natural and artificial regeneration
Natural and artificial regenerationNatural and artificial regeneration
Natural and artificial regenerationVivek Srivastava
 
Plasma protiens and their clinical significance
Plasma protiens and their clinical significancePlasma protiens and their clinical significance
Plasma protiens and their clinical significanceHussan Sheikh
 
Protein engineering saurav
Protein engineering sauravProtein engineering saurav
Protein engineering sauravSaurav Das
 
Sars Presentation
Sars PresentationSars Presentation
Sars Presentationcglace
 
Protein engineering
Protein engineeringProtein engineering
Protein engineeringbansalaman80
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedSlideShare
 

Viewers also liked (20)

Computational Protein Design. 4. A Practical Exercise
Computational Protein Design. 4. A Practical ExerciseComputational Protein Design. 4. A Practical Exercise
Computational Protein Design. 4. A Practical Exercise
 
Computational Protein Design. 3. Applications in Systems and Synthetic Biology
Computational Protein Design. 3. Applications in Systems and Synthetic BiologyComputational Protein Design. 3. Applications in Systems and Synthetic Biology
Computational Protein Design. 3. Applications in Systems and Synthetic Biology
 
Protein networks as a scaffold for structuring other data
Protein networks as a scaffold for structuring other dataProtein networks as a scaffold for structuring other data
Protein networks as a scaffold for structuring other data
 
Cryotherapy for Pathogen Free Planting Material in Ornamental Crops
Cryotherapy for Pathogen Free Planting Material in Ornamental Crops Cryotherapy for Pathogen Free Planting Material in Ornamental Crops
Cryotherapy for Pathogen Free Planting Material in Ornamental Crops
 
Plegable Proteínas - Biología Molecular
Plegable Proteínas - Biología MolecularPlegable Proteínas - Biología Molecular
Plegable Proteínas - Biología Molecular
 
Antigen processing lecture-nkn
Antigen processing  lecture-nknAntigen processing  lecture-nkn
Antigen processing lecture-nkn
 
Computational Protein Design. Overview
Computational Protein Design. OverviewComputational Protein Design. Overview
Computational Protein Design. Overview
 
Effect of The Different Auxins and Cytokinins in Callus Induction, Shoot, and...
Effect of The Different Auxins and Cytokinins in Callus Induction, Shoot,and...Effect of The Different Auxins and Cytokinins in Callus Induction, Shoot,and...
Effect of The Different Auxins and Cytokinins in Callus Induction, Shoot, and...
 
zahid hussain ajk
zahid hussain ajkzahid hussain ajk
zahid hussain ajk
 
Computational Protein Design. 1. Challenges in Protein Engineering
Computational Protein Design. 1. Challenges in Protein EngineeringComputational Protein Design. 1. Challenges in Protein Engineering
Computational Protein Design. 1. Challenges in Protein Engineering
 
Developing an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
Developing an Efficient Infrastruture, Standards and Data-Flow for MetabolomicsDeveloping an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
Developing an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
 
Plant tissue culture march 2
Plant tissue culture march 2Plant tissue culture march 2
Plant tissue culture march 2
 
Plant tissue culture
Plant tissue culturePlant tissue culture
Plant tissue culture
 
Natural and artificial regeneration
Natural and artificial regenerationNatural and artificial regeneration
Natural and artificial regeneration
 
Plasma protiens and their clinical significance
Plasma protiens and their clinical significancePlasma protiens and their clinical significance
Plasma protiens and their clinical significance
 
Protein engineering saurav
Protein engineering sauravProtein engineering saurav
Protein engineering saurav
 
Sars Presentation
Sars PresentationSars Presentation
Sars Presentation
 
Protein engineering
Protein engineeringProtein engineering
Protein engineering
 
Plant Tissue Culture, Methods and Applications
Plant Tissue Culture, Methods and ApplicationsPlant Tissue Culture, Methods and Applications
Plant Tissue Culture, Methods and Applications
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-Presented
 

Similar to Computational Protein Design. 2. Computational Protein Design Techniques

EUGM15 - Michael J. Bodkin (Evotec): Algorithms, Evolution and Network-Based ...
EUGM15 - Michael J. Bodkin (Evotec): Algorithms, Evolution and Network-Based ...EUGM15 - Michael J. Bodkin (Evotec): Algorithms, Evolution and Network-Based ...
EUGM15 - Michael J. Bodkin (Evotec): Algorithms, Evolution and Network-Based ...ChemAxon
 
Oral presentation at Protein Folding Consortium Workshop in Berkeley (2017)
Oral presentation at Protein Folding Consortium Workshop in Berkeley (2017)Oral presentation at Protein Folding Consortium Workshop in Berkeley (2017)
Oral presentation at Protein Folding Consortium Workshop in Berkeley (2017)Temple University
 
Structured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelStructured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelLaboratoire Statistique et génome
 
ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...
ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...
ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...ijbbjournal
 
Coloured Algebras and Biological Response in Quantum Biological Computing Arc...
Coloured Algebras and Biological Response in Quantum Biological Computing Arc...Coloured Algebras and Biological Response in Quantum Biological Computing Arc...
Coloured Algebras and Biological Response in Quantum Biological Computing Arc...AIRCC Publishing Corporation
 
COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...
COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...
COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...ijcsit
 
PCA-CompChem_seminar
PCA-CompChem_seminarPCA-CompChem_seminar
PCA-CompChem_seminarAnne D'cruz
 
A Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target ClassifierA Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target ClassifierCSCJournals
 
Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)Michel Dumontier
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Gota Morota
 
Locally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet MetricsLocally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet MetricsNTNU
 
A Novel High Accuracy Algorithm for Reference Assembly in Colour Space
A Novel High Accuracy Algorithm for Reference Assembly in Colour SpaceA Novel High Accuracy Algorithm for Reference Assembly in Colour Space
A Novel High Accuracy Algorithm for Reference Assembly in Colour SpaceCSCJournals
 
2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPs
2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPs2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPs
2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPsWALEBUBLÉ
 
Piepho et-al-2003
Piepho et-al-2003Piepho et-al-2003
Piepho et-al-2003Juaci Cpac
 
Protein Distance Map Prediction based on a Nearest Neighbors Approach
Protein Distance Map Prediction based on a Nearest Neighbors ApproachProtein Distance Map Prediction based on a Nearest Neighbors Approach
Protein Distance Map Prediction based on a Nearest Neighbors ApproachGualberto Asencio Cortés
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?Rajarshi Guha
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysistuxette
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsSelman Bozkır
 

Similar to Computational Protein Design. 2. Computational Protein Design Techniques (20)

EUGM15 - Michael J. Bodkin (Evotec): Algorithms, Evolution and Network-Based ...
EUGM15 - Michael J. Bodkin (Evotec): Algorithms, Evolution and Network-Based ...EUGM15 - Michael J. Bodkin (Evotec): Algorithms, Evolution and Network-Based ...
EUGM15 - Michael J. Bodkin (Evotec): Algorithms, Evolution and Network-Based ...
 
Oral presentation at Protein Folding Consortium Workshop in Berkeley (2017)
Oral presentation at Protein Folding Consortium Workshop in Berkeley (2017)Oral presentation at Protein Folding Consortium Workshop in Berkeley (2017)
Oral presentation at Protein Folding Consortium Workshop in Berkeley (2017)
 
Structured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelStructured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical model
 
ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...
ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...
ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...
 
Coloured Algebras and Biological Response in Quantum Biological Computing Arc...
Coloured Algebras and Biological Response in Quantum Biological Computing Arc...Coloured Algebras and Biological Response in Quantum Biological Computing Arc...
Coloured Algebras and Biological Response in Quantum Biological Computing Arc...
 
COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...
COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...
COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...
 
PCA-CompChem_seminar
PCA-CompChem_seminarPCA-CompChem_seminar
PCA-CompChem_seminar
 
A Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target ClassifierA Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target Classifier
 
Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
 
Locally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet MetricsLocally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet Metrics
 
A Novel High Accuracy Algorithm for Reference Assembly in Colour Space
A Novel High Accuracy Algorithm for Reference Assembly in Colour SpaceA Novel High Accuracy Algorithm for Reference Assembly in Colour Space
A Novel High Accuracy Algorithm for Reference Assembly in Colour Space
 
2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPs
2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPs2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPs
2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPs
 
Piepho et-al-2003
Piepho et-al-2003Piepho et-al-2003
Piepho et-al-2003
 
Protein Distance Map Prediction based on a Nearest Neighbors Approach
Protein Distance Map Prediction based on a Nearest Neighbors ApproachProtein Distance Map Prediction based on a Nearest Neighbors Approach
Protein Distance Map Prediction based on a Nearest Neighbors Approach
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis
 
Ef24836841
Ef24836841Ef24836841
Ef24836841
 
Presentation
PresentationPresentation
Presentation
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
 

Recently uploaded

Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKUXDXConf
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyUXDXConf
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsUXDXConf
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoUXDXConf
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfEasyPrinterHelp
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty SecureFemke de Vroome
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 

Recently uploaded (20)

Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdf
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 

Computational Protein Design. 2. Computational Protein Design Techniques

  • 1. Computational Protein Design 2. Computational Protein Design Techniques Pablo Carbonell pablo.carbonell@issb.genopole.fr iSSB, Institute of Systems and Synthetic Biology Genopole, University d’Évry-Val d’Essonne, France mSSB: December 2010 Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 1 / 45
  • 2. Outline 1 Introduction 2 Computational Protein Descriptors 3 Sequence-based CPD 4 Structure-based CPD 5 Search Algorithms in CPD 6 De Novo Design 7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 2 / 45
  • 3. Outline 1 Introduction 2 Computational Protein Descriptors 3 Sequence-based CPD 4 Structure-based CPD 5 Search Algorithms in CPD 6 De Novo Design 7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 3 / 45
  • 4. Computational Protein Design Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 4 / 45
  • 5. A Blueprint of CPD Approaches ∗ RS : research studies Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 5 / 45
  • 6. Outline 1 Introduction 2 Computational Protein Descriptors 3 Sequence-based CPD 4 Structure-based CPD 5 Search Algorithms in CPD 6 De Novo Design 7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 6 / 45
  • 7. Molecular Signature Descriptors A 2D representation of the molecular graphs Atomic signature : as an undirected colored graphs G(V , E, C), Xh with V : atoms, E : bonds, C : atom type h σ(G) = σ(x) (1) The signature descriptor of height h of atom x x∈V in the molecular graph G, or h σ(x), is a The signature is a systematic canonical representation of the subgraph of codification of the molecular G containing all atoms that are at distance h graph [Faulon et al., 2004] from x σ(methylcyclopropane) = 1 [C]([H][C]([H][H][C,0])[C,0]([H][H])[C]([H][H][H])) 2 [C]([H][H][C]([H][C,0][C]([H][H][H]))[C,0]([H][H])) 1 [C]([H][H][H][C]([H][C]([H][H][C,0])[C,0]([H][H]))) 1 [H]([C]([C]([H][H][C,0])[C,0]([H][H])[C]([H][H][H]))) 4 [H]([C]([H][C]([H][C,0][C]([H][H][H]))[C,0]([H][H]))) 3 [H]([C]([H][H][C]([H][C]([H][H][C,0])[C,0]([H][H])))) Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 7 / 45
  • 8. Molecular Signature of Reactions and Proteins Signature of a reaction. The signature of reaction R S1 + S2 + . . . + Sn → P1 + P2 + . . . + Pn (2) that transforms n substrates into m products is given by the difference between the signature of the products and the signature of the substrates: h Xh Xh σ(R) = σ(p) − σ(s) (3) p∈P s∈S Signature of protein sequences. The protein P is represented by the linear chain given by its collapsed graph at residue level, a reduced molecular graph representation G(V , E, C) known as string signature where V : residues a ∈ A, E : contiguous in sequence, C : amino acid type h Xh σ(P) = σ(a) (4) a∈A Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 8 / 45
  • 9. Protein Contact Maps The protein contact map is a graph representation of the 3D interactions at residue level G(V , E, C) where V : residues, E : contacts, C : amino acid type Two residues are considered to interact when atoms between both residues are at a distance lower than a predetermined threshold (tipically 4.5 ∼ 5 Å) Contact maps can account for long-range interactions and conformational states Song et al. [2010] Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 9 / 45
  • 10. Outline 1 Introduction 2 Computational Protein Descriptors 3 Sequence-based CPD 4 Structure-based CPD 5 Search Algorithms in CPD 6 De Novo Design 7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 10 / 45
  • 11. Sequence and Structure-Based CPD Sequence-based CPD methods are in some cases a good trade-off between complexity of the model and accuracy of the predictions Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 11 / 45
  • 12. Sequence-based Knowledge-based potentials The simplest way to score a protein and to identify active regions is through amino acid scales or indexes AAindex is a database of 544 amino acid indexes 94 Amino Acid Matrices 47 amino acid pair-wise contact potentials Examples: hydrophobicity, accessibility, van der Waals volume, secondary structure propensity, flexibility This approach is widely used when analyzing conserved motifs and correlated mutations in protein fold families through multiple alignments Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 12 / 45
  • 13. Quantitative Structure-Activity Relationship (QSAR) Techniques The goal is to model causal relationships QSAR is a statistical method used between extensively by the chemical and pharmaceutical industries in structures of interacting molecules small-molecules and peptide measurables properties of scientific optimization or commercial interest such as ADME/Tox (absorption, distribution, metabolism, excretion, and toxicity) of drugs Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 13 / 45
  • 14. QSAR Model Evaluation Model predictability is generally evaluated through the leave-one-out (LOO) cross-validation correlation coefficient q 2 Partial least-squares (PLS) regression is commonly used Additional nonlinear terms can be added through the use of nonlinear regression or machine learning techniques (kernel methods, random forests, etc) Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 14 / 45
  • 15. QSAR Modeling Workflow Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 15 / 45
  • 16. Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 16 / 45
  • 17. Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 17 / 45
  • 18. The ProSAR Algorithm An extension of SAR-based approaches to CPD It formalizes the decision-making processes about which mutations to include in combinatorial libraries N XX y = cij xij (5) i=1 j∈A y : the predicted function (activity) of the protein sequence cij : the regression coefficients corresponding to the mutational effect of having residue j among the 20 amino acids A at postion i xij : binary variable indicating the presence or absence of residue j at position i Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 18 / 45
  • 19. Improving Catalytic Function by ProSAR-driven Enzyme Evolution Statistical analysis of protein sequence activity relationships Bacterial biocatalysis of Atorvastatin (Lipitor) (cholesterol-lowering drug) Codexis Inc. Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 19 / 45
  • 20. Outline 1 Introduction 2 Computational Protein Descriptors 3 Sequence-based CPD 4 Structure-based CPD 5 Search Algorithms in CPD 6 De Novo Design 7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 20 / 45
  • 21. Structure-based CPD Energy functions and molecular force fields Local conformational restrictions Predicting entropic factors Protein topological properties From Narasimhan et al. [2010] Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 21 / 45
  • 22. Energy Functions and Molecular Force Fields In structure-based CPD, folds are usually represented by the spatial coordinates of the backbone atoms or design scaffold Protein design is done by amino acid side chains along the scaffold Side chains are only permitted to assume a discrete set of statistically preferred conformations: rotamers Rotamer/backbone and rotamer/rotamer interaction energies are tabulated These potential energies can then be approximated by using any of the standard force fields : CHARMM, AMBER, GROMOS Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 22 / 45
  • 23. Molecular Force Fields AMBER: a classical force field for energy and MD calculations: X 1 X 1 X 1 V (r N ) = kb (l − l0 )2 + ka (θ − θ0 )2 + Vn [1 + cos(nω − γ)] 2 2 2 bonds angles torsions N−1 X ( "„ « „ «6 # ) X N r0ij 12 r0ij qi qj + i,j −2 + (6) rij rij 4π 0 rij j=1 i=j+1 P 1 (·): energy between covalently bonded atoms. Pbonds angles (·): energy due to the geometry of electron orbitals involved in covalent 2 bonding. P torsions (·): energy for twisting a bond due to bond order (e.g. double bonds) and 3 neighboring bonds or lone pairs of electrons. PN−1 PN i=j+1 (·): non-bonded energy between all atom pairs: 4 j=1 1 van der Waals energies 2 Electrostatic energies Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 23 / 45
  • 24. Structure-based Knowledge-based Potentials They are built by performing a large-scale statistical study of structural databases such as PDB (Protein Data Bank) Rotamer libraries (∼ 150 rotameric states) Binary patterning: only some type of amino acids are allowed based on the hydrophobic environment An implicit solvation model Secondary structure propensity Frequency of small segments in the PDB Pairwise potentials van der Waals interactions Hydrogen bonding Electrostatics Entropy-based penalties for flexible side-chains From Boas and Harbury [2007] Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 24 / 45
  • 25. Energy Functions Design along the backbone or scaffold Rotamer/backbone and rotamer/rotamer interact. energies tabulated Precomputed from molecular force fields : CHARMM, AMBER, GROMOS Total energy of the protein X X ETOT = Ek (rk ) + Ekl (rk , rl ) (7) k k =l N : length of the protein rk : the rotamer of the kth side chain Ek (rk ) : the self-energy of a particular rotamer rk Ekl (rk , rl ) : the pair energy of rotamers rk , rj Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 25 / 45
  • 26. The Role of Dynamics Besides protein structure, protein dynamics can play a direct role in molecular recognition Flexible proteins recognize their targets through induced fit or conformational selection, likely showing promiscuity Binding is commonly enthalpy-driven, but in some cases entropy is important, for instance: Proteins with multiple binding sites Small hydrophobic molecules Two types of source of protein motions: Protein flexibility: intraconformational dynamics (fast time scale motions) Conformational heterogeneity: interconformational dynamics Gibbs free energy: ∆G = ∆H − T ∆S (8) ∆S = ∆Ssolv + ∆Sconf + ∆Srt (9) ∆Sconf : conformational entropy of protein and ligand ∆Srtf : rotational and translational degree of freedoms Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 26 / 45
  • 27. Predicting Side-chain Dynamics from Structural Descriptors The Lipari-Szabo model free approach approach allows to quantify motions from NMR experiments by computing the generalized order parameter S 2 Protein backbone dynamics : 15 NH and 13 Cα H NMR relaxation methods Protein side chain methyl dynamics : 13 Cα H NMR relaxation methods (side-chain motions in the picosecond-to-nanosecond time regime) From the BMRB we compiled S 2 data for 18 proteins, including 10 proteins in 2 or more different states : calmodulin, barnase, pdz, mup, dfhr, staphylococcal nuclease, pin1, sh3 domain, MSG This technique provides only measurements for the Cα of methyl groups in side chains : ALA, LEU, ILE, MET, THR, VAL Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 27 / 45
  • 28. Structural Descriptors of Methyl Dynamics We consider the following parameters influencing side-chain dynamics : Packing density at the methyl site i and its neighboring residues j within a sphere of r =5Å 0 1 X X B X Pi = Cj e−rij = e−rjk A e−rij (10) C @ rij <5Å rij <5Å rjk <5Å Side chain stiffness : number of dihedral angles separating the backbone from the methyl carbon. weighted by the side-chain packing Rotameric state : angular distance ∆χ = χ − χ0 to the closest rotameric state χ0 in the library Elongation : distance from the methyl site to the Cα Pairwise contact potential : a knowledge-based potential of frequence of contacts between residues at several distances computed from the PDB Solvation effect : DSSP accessibility and residue hydrophobicity Van der Waals contacts Hydrogen bonds (in the case of Threonine) Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 28 / 45
  • 29. Predicting Methyl Side-chain Dynamics Algorithm : neural network Cross-validation : r = 0.71 ± 0.029 Example : experimental and predicted (p-value = 4.6 × 10−87 ) changes in ∆S 2 of barnase after binding barstar Protein MD method r (MD) r (nnet) ubiquitin AMBER99SB 0.81 0.81 TNfn3 CHARMM 22 0.62 0.79 ∆S 2 > 0 ∆S 2 < 0 FNfn10 CHARMM 22 0.51 0.64 rigidification flexibilization barnase OPLS-AA/L 0.55 0.64 calmodulin FDPB 0.60 0.72 [Carbonell and del Sol, 2009] Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 29 / 45
  • 30. Outline 1 Introduction 2 Computational Protein Descriptors 3 Sequence-based CPD 4 Structure-based CPD 5 Search Algorithms in CPD 6 De Novo Design 7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 30 / 45
  • 31. Search Algorithms in CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 31 / 45
  • 32. Search Algorithms Objective: finding the best design within the space of all possible amino acid/rotameric states A vast search space: 20N or pN N: number of positions to mutate p: number of rotameric states Strategies Deterministic algorithms Dead-end elimination (DEE) algorithm: a pruning method. Some accelerations of the DEE algorithm: upper-bound estimation; the “magic bullet” metric; conformational splitting; background optimization Stochastic algorithms Monte Carlo Simulated annealing Genetic algorithms Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 32 / 45
  • 33. The DEE Algorithm It assumes that the energy of the protein can be written as X X ETOT = Ek (rk ) + Ekl (rk , rl ) (11) k k =l N : length of the protein rk : the rotamer of the kth side chain Ek (rk ):" the self-energy of a particular rotamer rk Ekl (rk , rl ): the pair energy of the rotamers rk , rj Complexity: Single search scales quadratically with total number of rotamers O((p × N)2 ) Pair search scales cubically O((p × N)3 ) Brute force enumeration : O(pN ) Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 33 / 45
  • 34. The DEE Algorithm Single rotamers and rotamer pairs are eliminated during the computational cycles Single elimination : eliminate rotamer if some other rotamer in the side chain gives better energy N X N X A Ek (rk ) + min Ekl (rk , rlX ) A > B Ek (rk ) + max Ekl (rk , rlX ) B (12) X X l=1 l=1 Pairs elimination : eliminate pair of rotamers in two positions if there exists another pair that gives better energy def Ukl = Ek (rk ) + El (rlB ) + Ekl (rk , rlB ) AB A A (13) N X “ ” AB Ukl + min Eki (rk , riX ) + Elj (rlB , rjX ) > A X i=1 N X “ ” CD Ukl + max Eki (rk , riX ) + Elj (rlD , rjX ) C (14) X i=1 Values are precomputed and stored in energy matrices Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 34 / 45
  • 35. Stochastic Algorithms Search in the space of feasible designs by making a series of combinations of random and directed moves Monte Carlo Metropolis: a move consists of exchanging one rotamer for another at a randomly chosen position, a modification is accepted if it lowers the energy Simulated Annealing allows to explore nearby solutions at the initial cycles of the search Genetic Algorithms: a population of models is propagated (evolved) throughout the course of the run and genetic operators, such as recombination, are used to create new models from existing parents They are fast, can be scaled up to problems of large complexity They are not guaranteed to converge to the optimal solution Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 35 / 45
  • 36. The SCHEMA Algorithm Equivalent to an in silico directed evolution Consists of scoring libraries of hybrid protein sequences against the parental sequence Scoring: Calculate the number of interactions between residues (contacts within 4.5 Å) that are disrupted in the creation of hybrid proteins Hybrids are scored for stability by counting the number of disruptions Protein is partitioned into blocks that should not From [Meyer et al., 2006] interrupted by crossovers (analog to genetic algorithms) Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 36 / 45
  • 37. The OPTCOM and IPRO Algorithms for Library Design The OPTCOM algorithm: The IPRO algorithm: Balances size and Identify point mutations in the parent sequences quality of the library using energy-based scoring fuctions Residue and rotamer choices are driven by a mixed-integer linear programming formulation (MILP) From [Saraf et al., 2006] Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 37 / 45
  • 38. Some Web Resources IPRO: Iterative Protein Redesign and Optimization. http://maranas.che.psu.edu/IPRO.htm EGAD: A Genetic Algorithm for protein Design. http://egad.ucsd.edu/software.php RosettaDesign: A software package. http://rosettadesign.med.unc.edu/ SCHEMA A pair-wise energy function for scoring protein chimeras made from homologous proteins. http://www.che.caltech.edu/groups/fha/ schema-tools/schema-overview.html SHARPEN: Systematic Hierarchical Algorithms for Rotamers and Proteins on an Extended Network. http://koko.che.caltech.edu/sharpenabout.html WHAT IF: Software for protein modelling, design, validation, and visualisation. http://swift.cmbi.ru.nl/whatif/ FoldX: A force field for energy calculations and protein design. http://foldx.crg.es/ Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 38 / 45
  • 39. Outline 1 Introduction 2 Computational Protein Descriptors 3 Sequence-based CPD 4 Structure-based CPD 5 Search Algorithms in CPD 6 De Novo Design 7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 39 / 45
  • 40. De Novo-Designed Proteins In de novo designs, some assumptions are needed in order to make the search space tractable Usually we start from some basic motifs or domains as scaffolds for the design Examples: βαβ motif resembling a zinc finger 3 and 4 helix bundles Helical coiled-coils Helix bundle motifs can be parametrized using a few global variables that describe the global structure Applications: New metal-binding sites Nonbiological cofactors for novel biomaterials and electromechanical devices Novel enzymatic activities Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 40 / 45
  • 41. Example: De Novo Design of a Metalloprotein Computational de novo design of a four-helix (108 residues) bundle containing the non-biological cofactor iron diphenyl porphyrin (DPP-Fe) [Bender et al., 2007] The initial helix bundle was selected as low-energy structure computed with MCSA STITCH: a program to select loops connecting helices from PDB Select CHARMM and PROCHECK for removing overlaps 4 His and the 4 Thr residues to support the 6-point coordination of the Fe(III) cations SCADS: provides side-dependent amino acid probabilities in each round Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 41 / 45
  • 42. Outline 1 Introduction 2 Computational Protein Descriptors 3 Sequence-based CPD 4 Structure-based CPD 5 Search Algorithms in CPD 6 De Novo Design 7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 42 / 45
  • 43. Challenges in Sequence and Structure-Based CPD Modeling Greater availability of 3D protein structural information More accurate energy functions Improvement of rigid and flexible docking Design Improvement in search algorithms Parametrization for non-natural amino acids Prediction Beyond additive models: using machine-learning algorithms More complete environment descriptors Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 43 / 45
  • 44. Computational Protein Design 2. Computational Protein Design Techniques Pablo Carbonell pablo.carbonell@issb.genopole.fr iSSB, Institute of Systems and Synthetic Biology Genopole, University d’Évry-Val d’Essonne, France mSSB: December 2010 Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 44 / 45
  • 45. Bibliography I Gretchen M. Bender, Andreas Lehmann, Hongling Zou, Hong Cheng, H. Christopher Fry, Don Engel, Michael J. Therien, J. Kent Blasie, Heinrich Roder, Jeffrey G. Saven, and William F. DeGrado. De Novo Design of a Single-Chain Diphenylporphyrin Metalloprotein. Journal of the American Chemical Society, 129(35):10732–10740, September 2007. ISSN 0002-7863. doi: 10.1021/ja071199j. URL http://dx.doi.org/10.1021/ja071199j. F. Edward Boas and Pehr B. Harbury. Potential energy functions for protein design. Current opinion in structural biology, 17(2):199–204, April 2007. ISSN 0959-440X. doi: 10.1016/j.sbi.2007.03.006. URL http://dx.doi.org/10.1016/j.sbi.2007.03.006. Pablo Carbonell and Antonio del Sol. Methyl side-chain dynamics prediction based on protein structure. Bioinformatics, pages btp463+, July 2009. doi: 10.1093/bioinformatics/btp463. URL http://dx.doi.org/10.1093/bioinformatics/btp463. Jean-Loup L. Faulon, Michael J. Collins, and Robert D. Carr. The signature molecular descriptor. 4. Canonizing molecules using extended valence sequences. Journal of chemical information and computer sciences, 44(2):427–436, 2004. ISSN 0095-2338. doi: 10.1021/ci0341823. URL http://dx.doi.org/10.1021/ci0341823. Michelle M. Meyer, Lisa Hochrein, and Frances H. Arnold. Structure-guided SCHEMA recombination of distantly related β-lactamases. Protein Engineering Design and Selection, 19(12):563–570, December 2006. ISSN 1741-0126. doi: 10.1093/protein/gzl045. URL http://dx.doi.org/10.1093/protein/gzl045. Diwahar Narasimhan, Mark R. Nance, Daquan Gao, Mei-Chuan Ko, Joanne Macdonald, Patricia Tamburi, Dan Yoon, Donald M. Landry, James H. Woods, Chang-Guo Zhan, John J. G. Tesmer, and Roger K. Sunahara. Structural analysis of thermostabilizing mutations of cocaine esterase. Protein Engineering Design and Selection, 23(7):537–547, July 2010. doi: 10.1093/protein/gzq025. URL http://dx.doi.org/10.1093/protein/gzq025. Manish C. Saraf, Gregory L. Moore, Nina M. Goodey, Vania Y. Cao, Stephen J. Benkovic, and Costas D. Maranas. IPRO: an iterative computational protein library redesign and optimization procedure. Biophysical journal, 90(11):4167–4180, June 2006. ISSN 0006-3495. doi: 10.1529/biophysj.105.079277. URL http://dx.doi.org/10.1529/biophysj.105.079277. Jiangning Song, Kazuhiro Takemoto, Hongbin Shen, Hao Tan, Michael M. Gromiha, and Tatsuya Akutsu. Prediction of Protein Folding Rates from Structural Topology and Complex Network Properties. IPSJ Transactions on Bioinformatics, 3:40–53, 2010. doi: 10.2197/ipsjtbio.3.40. URL http://dx.doi.org/10.2197/ipsjtbio.3.40. Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 45 / 45