2. • Basic Protein Biochemistry, rDNA & Prot Aggregation Structure-function
relationships of biological macromolecules, in particular proteins, provide crucial
insights for fundamental biochemistry, medical research and early drug discovery.
• However, production of recombinant proteins, either for structure determination,
functional studies, or to be used as biopharmaceutical products, is often hampered
by their instability and propensity to aggregate in solution in vitro.
• Protein samples of poor quality are often associated with reduced reproducibility as
well as high research and production expenses.
• Several biophysical methods are available for measuring protein aggregation and
stability.
• Yet, discovering and developing means to improve protein behaviour and structure-
function integrity remains a demanding task.
• Workflows that are made possible by adapting established biophysical methods to
high-throughput screening approaches.
• Rapid identification and optimisation of conditions that promote protein stability
and reduce aggregation will support researchers and industry to maximise sample
quality, stability and reproducibility, thereby reducing research and development
time and costs.
3. • It has been long been recognised that protein aggregation pervades
human morbidity and mortality and impinges on our ability to
produce life-saving and life changing protein therapeutics both
rapidly and economically.
• It is now widely understood that as well as adopting soluble,
functional structures, many proteins can also self-assemble forming
structured aggregates such as amyloid fibrils or to undergo liquid-
liquid phase separation.
• The later process drives the formation of membrane less organelles
that can be functional (such as in the nucleolus), or causative of
cellular dysfunction and disease (such as in virus replication or in
protein aggregation disorders (Figure 1).
• The ability of proteins to catalyse reactions, to form stable scaffolds,
and to bind ligands tightly and with high specificity, has enormous
potentials for the use of proteins in industry.
4. The precursor of aggregation may be the unfolded, partially folded or native state of a
protein. --.>During amyloid formation, oligomeric species formed from the initial
aggregation-prone monomer,--.> can then assemble further to form higher-order
oligomers, one or more of which can form a nucleus, which, by rapidly recruiting other
monomers, can nucleate assembly into protofibrils and amyloid fibrils.
As fibrils grow, they can fragment, yielding more fibril ends that are capable of elongation
by the addition of new aggregation-prone species. Alternatively, amorphous aggregation
can occur via one or more aggregation-prone species growing into larger species
5. • Delineating aggregation mechanisms using rational protein
engineering methods Rational redesign
• (i.e. the substitution of a small number of residues in a protein
sequence with those having the desired physico-chemical or spatial
properties)
• is an attractive approach to modulate protein aggregation when
there is prior knowledge of the mechanism of aggregation (Figure
2)
• (e.g. by altering a protein–protein interface required for
aggregation).
• Approaches such as alanine scanning can also be used to identify or
confirm predictions of residues key to the control of aggregation.
• The ability to identify ‘aggregation hotspots’ has been facilitated by
the development of at least 40 different algorithms
6. • Detecting aggregation-prone regions in
primary sequences
• TANGO identifies APRs by calculating the
propensity of penta-peptide sequences to
form buried b-sheets, using an algorithm
trained on experimental measurement.
9. • How protein aggregation occurs?
• Protein aggregation occurs through a variety of
mechanisms, initiated by the unfolded, non-native, or even
the native state itself.
• Understanding the molecular mechanisms of protein
aggregation is challenging, given the array of competing
interactions that control solubility, stability, cooperativity
and aggregation propensity.
• An array of methods have been developed to interrogate
protein aggregation, spanning computational algorithms
able to identify aggregation-prone regions, to deep
mutational scanning to define the entire mutational
landscape of a protein’s sequence.
10. • Protein Aggregation— What happens?
• Small molecule aggregates are a leading cause of artifacts in early drug discovery,
but little is known about their interactions with proteins, nor why some proteins
are more susceptible to inhibition than others.
• A possible reason for this apparent selectivity is that aggregation-based inhibition,
as a stoichiometric process, is sensitive to protein concentration, which varies
across assays.
• Alternatively, local protein unfolding by aggregates may lead to selectivity since
stability varies among proteins.
• To deconvolute these effects, differentially stable point mutants of a single protein,
TEM-1 β-lactamase been used.
• Broadly, destabilized mutants had higher affinities for and were more potently
inhibited by aggregates versus more stable variants.
• Addition of the irreversible inhibitor moxalactam destabilized several mutants, and
these typically bound tighter to a colloidal particle, while the only mutant it
stabilized bound weaker.
• These results suggest that less stable enzymes are more easily sequestered and
inhibited by colloidal aggregates.
11. • Aggregation is one of the most prevalent means of degradation for biopharmaceutical
products, and protein therapeutics are susceptible to this phenomenon at various stages, from
production to administration.
• Aggregation is a long-standing problem in biopharmaceutical manufacturing processes, as it
may lead to lower yields , loss of biological activity , reduced target binding affinity and
unwanted immunogenic responses in patients.
• Aggregates have also been linked to abated drug effectiveness.
• Aggregation must be strictly controlled to fulfill the specifications set by regulatory
administrations .
• In addition, mitigating aggregation is desirable for patient safety, is less time-consuming for
manufacturing processes, and lowers costs for companies and patients.
• At least 41 human diseases are associated with in vivo aggregation, including Alzheimer’s
disease, spongiform encephalopathies, and type II diabetes.
• Moreover, there is interest in employing amyloid fibrils in materials science and other
nanotechnology applications.
• As a result, the ability to control and predict the formation and structure of nonnative
aggregates is potentially of interest to a broad audience.
12. • Protein Aggregation— What happens?
• Protein stability effects in aggregate-based enzyme inhibition
• Small molecule aggregates are a leading cause of artifacts in early drug discovery, but little is
known about their interactions with proteins, nor why some proteins are more susceptible to
inhibition than others.
• A possible reason for this apparent selectivity is that aggregation-based inhibition, as a
stoichiometric process, is sensitive to protein concentration, which varies across assays.
• Alternatively, local protein unfolding by aggregates may lead to selectivity since stability
varies among proteins.
• To deconvolute these effects, differentially stable point mutants of a single protein, TEM-1 β-
lactamase been used.
• Broadly, destabilized mutants had higher affinities for and were more potently inhibited by
aggregates versus more stable variants.
• Addition of the irreversible inhibitor moxalactam destabilized several mutants, and these
typically bound tighter to a colloidal particle, while the only mutant it stabilized bound
weaker.
• These results suggest that less stable enzymes are more easily sequestered and inhibited by
colloidal aggregates.
13. • Aggregation is one of the most prevalent means of degradation for biopharmaceutical
products, and protein therapeutics are susceptible to this phenomenon at various stages, from
production to administration.
• Aggregation is a long-standing problem in biopharmaceutical manufacturing processes, as it
may lead to lower yields , loss of biological activity , reduced target binding affinity and
unwanted immunogenic responses in patients.
• Aggregates have also been linked to abated drug effectiveness.
• Aggregation must be strictly controlled to fulfill the specifications set by regulatory
administrations .
• In addition, mitigating aggregation is desirable for patient safety, is less time-consuming for
manufacturing processes, and lowers costs for companies and patients.
• At least 41 human diseases are associated with in vivo aggregation, including Alzheimer’s
disease, spongiform encephalopathies, and type II diabetes.
• Moreover, there is interest in employing amyloid fibrils in materials science and other
nanotechnology applications.
• As a result, the ability to control and predict the formation and structure of nonnative
aggregates is potentially of interest to a broad audience.
14. • Recent advances in this exciting and emerging
field, focussing on protein engineering
approaches that, together with improved
computational methods,
• hold promise to predict and control protein
aggregation linked to human disease, as well
as facilitating the manufacture of protein-
based therapeutics.
15. Illustrative overview of nonnative aggregation pathways.
Step 1: partial or full unfolding or misfolding of natively folded monomers (F), resulting
in exposure of the hot-spot peptide sequences or aggregation-prone regions (red ).
Step 2: reversible self-association of protein monomers.
Step 3: creation of the smallest net-irreversible species (sometimes termed nuclei) via
strong intermolecular noncovalent contacts between accessible hot spots.
Step 4: aggregate growth. Double arrows represent reversible steps; single arrows
represent effectively irreversible ones.
16. • Modulation of protein stability and aggregation properties by surface
charge engineering†
•
• to alter protein surface charges through traditional protein engineering
approaches often affects the native protein structure significantly and
induces misfolding.
• This limitation is a major hindrance in modulating protein properties
through surface charge variations.
• In this study, as a strategy to overcome such a limitation, we attempted to
co-introduce stabilizing mutations that can neutralize the destabilizing
effect of protein surface charge variation.
• Two sets of rational mutations were designed;
• one to increase the number of surface charged amino acids and
• the other to decrease the number of surface charged amino acids
• -> by mutating surface polar uncharged amino acids and charged amino
acids, respectively.
17. • These two sets of mutations were introduced into Green
Fluorescent Protein (GFP) together with or without stabilizing
mutations.
• The co-introduction of stabilizing mutations along with mutations
for surface charge modification allowed us to obtain functionally
active protein variants (s-GFP(+15–17) and s-GFP(+5–6)).
• When the protein properties such as fluorescent activity, folding
rate and kinetic stability were assessed,
• we found the possibility that the protein stability can be
modulated independently of activity and folding by engineering
protein surface charges.
• The aggregation properties of GFP could also be altered through
the surface charge engineering.
18. • Design, molecular modeling and molecular dynamics
simulation of the GFP mutants
• Cloning and expression analysis
• Protein purification
• Fluorescence measurement
• Refolding kinetics
• Thermal effect on protein stability
• Effect of the chemical denaturant on protein stability
• Effect of elevated temperature on protein aggregation
properties
19. • Result---
• s-GFP(+5–6), a GFP variant with a lower charge number, showed
lower stability and higher aggregation propensity than the control
GFP (s-GFP(+10–13))
• but the properties of the GFP variant with higher numbers of
charges (s-GFP(+15–17)) was similar to those of the control GFP.
• These results suggest that the surface charges are important factors
in the conformational stability and dispersion of proteins in water,
as is generally recognized.
• The stabilizing mutations for GFP were well known and stabilized s-
GFP(+10–13) was predicted to be very stable compared to normal
GFP,
• We expected that the GFP system could be the best model to
demonstrate our basic concept explicitly by demonstrating the
variation of surface charge numbers more dramatically.
20.
21. • Driving Forces for Nonnative Protein Aggregation and
Approaches to Predict Aggregation-Prone Regions
• Nonnative protein aggregation is the process by which
otherwise folded, monomeric proteins are converted to
stable aggregates composed of protein chains that have
undergone some degree of unfolding.
• Often, a conformational change is needed to allow certain
sequences of amino acids— so-called aggregation-prone
regions (APRs)—to form stable interprotein contacts such
as β-sheet structures.
• In addition to APRs that are needed to stabilize aggregates,
other factors or driving forces are also important in
inducing aggregation in practice.
22. • overall process and mechanistic drivers for
nonnative aggregation, followed by a more
detailed summary of the factors currently
thought to be important for determining
• which amino acid sequences most greatly
stabilize nonnative protein aggregates, as well as
a survey of many of the existing algorithms that
are publicly available to attempt to predict APRs.
23. • APPROACHES FOR PREDICTING AGGREGATION-PRONE REGIONS
• There are a growing number of computational approaches for predicting APRs in
polypeptides and proteins.
• This section presents a representative subset of published approaches or
algorithms so as to provide the general reader with a sense of the differences and
commonalities between the underlying assumptions of, inputs to, and outputs
from different algorithms.
• Zyggregator and CamSol
• Pawar et al. (49) used multivariate regression against experimental data for
aggregation of polypeptides to develop a semiquantitative expression for amyloid
aggregate propensity of unfolded
• polypeptides. The model included contributions from hydrophobicity, α-helical
propensity, β-
• sheet propensity, presence of hydrophobic/hydrophilic alternating patterns, and
the net charge of
• a polypeptide. The resulting equation can be implemented either for the whole
peptide sequence
• or at the level of individual residues and sliding windows of primary sequence.
24. • CamP method, which predicts local structural
• stability based on native hydrogen-exchange
experiments. Zyggregator, which integrates aggregation
and local stability predictions, includes α-helix and β-
sheet propensities, hydrophobicity, charge,
hydrophobic/hydrophilic patterns, and gatekeeping
effects of individual charges in its predictions.
• CamSol (55) is a protein solubility prediction tool that
employs the linear combination of the same variables
as Zyggregator, but the parameters and the definition
of the gatekeeping effect in CamSol are different from
those in the Zyggregator algorithm.
25. • PAGE
• The PAGE algorithm (57, 58) was developed with the goal of
predicting aggregation rates along with β-aggregate-prone regions.
It is based on combining known physicochemical properties
• of amino acids, along with computational simulations of β-
aggregating peptides.
• Aromaticity, β-propensity, charge, polar-nonpolar surfaces, and
solubility are the factors employed for APR identification.
• For aggregation rate predictions, temperature and polypeptide
concentration are also needed as inputs.
• PAGE also provides predictions of parallel/antiparallel β-sheet
organization in fibrils.
26. • Amyloidogenic Pattern
• This algorithm was based on experimental data
for a databank of peptides from full positional
scanning mutagenesis on the amyloidogenic
peptide STVIIE under solution conditions that
were a reasonable mimic of near-neutral or
highly acidic conditions.
• The resulting hexapeptides, incubated at room
temperature, were characterized with far-UV CD
(β-sheet population) at t=0 and 1month and
EM(fibril formation) at t=1month.
27. • TANGO
• The TANGO (40, 60, 61) algorithm was developed to predict
β-aggregate-prone regions in
• polypeptides and is based in part on statistical mechanical
arguments.
• 3D Profile (ZipperDB)
• This method is based on the solved crystal structure of
cross-β spine for aggregates ofNNQQNY
• and GNNQQNY derived from the sup35 prion protein of
Saccharomyces cerevisiae.
• Hexapeptide Conformational Energy/Pre-Amyl
• The Hexapeptide Conformational Energy/Pre-Amyl (63)
approach is another amyloid fibril
• propensity algorithm. It uses the same basic approach as
3D Profile but is computationally more
• efficient.
28. • AGGRESCAN AND AGGRESCAN3D
• AGGRESCAN (65) was developed using individual amino
acid aggregation propensities that were
• determined from the following in vivo experiment. Fusion
of green fluorescent protein (GFP) to
• insoluble proteins or peptides deteriorates the folding
ability ofGFP in Escherichia coli and results in
• a decrease inGFP fluorescence in vivo (66).GFP was fused to
Aβ42, the phenylalanine at position
• 19 of Aβ42 was mutated to all other 19 natural amino
acids, and the resulting in vivo fluorescence
• levels were measured. If substitution to an amino acid
resulted in lower fluorescence, then that
• amino acid was concluded to have a higher aggregation
propensity than phenylalanine, and vice
• versa.
29. • PASTA and PASTA 2.0
• PASTA (70, 71) is based on the idea that the
mechanisms and underlying physics that govern
• β-sheet formation in native proteins also hold for
β-sheet formation in amyloid aggregates (e.g.,
• cross-β structures).
• β-Sheets in native structures of globular proteins
were used to derive propensity scales for parallel
and antiparallel β-sheets.
30. • FoldAmyloid
• FoldAmyloid (80) uses expected packing density and probability of
hydrogen bond formation
• as the basis for predicting amyloidogenic regions, including those
rich in hydrophobic residues and in Asp and/or Glu residues in
protein sequences.
• NetCSSP
• It is possible for a given amino acid sequence to result in different
secondary structures within the
• context of a full protein, based on its contact with other residues in
3D structures of a protein.
• These types of sequences are termed chameleon sequences
• Pafig,physicochemical properties
• of amino acids and assume, based on physical grounds
31. • Spatial Aggregation Propensity and Developability
Index
• The spatial aggregation propensity (SAP) method (32)
focuses on the pattern of hydrophobic residues on the
solvent-exposed surface of the folded structure of a
protein.
• Waltz
• Waltz (93) focuses on predicting APRs using a scoring
function.
• AmyloidMutants
• AmyloidMutants (95) is a Boltzmann distribution
approach that seeks to predict amyloid-prone
• regions and shifts in amyloid conformations of
polypeptides.
32. • AbAmyloid
• AbAmyloid (97) is the first automatic and germline-independent algorithm specifically targeted
• to antibody amyloidogenicity predictions.
• FishAmyloid
• FishAmyloid (98) focuses on predicting regions that are prone to forming amyloid fibrils.
• GAP
• GAP (99), or Generalized Aggregation Proneness, seeks to predict APRs in polypeptides, as
• well as distinguish between amyloid-forming segments and amorphous segments.
• APPNN
• In 2015, Fam´ılia et al. (101) developed APPNNfor peptide or protein amyloidogenicity prediction
• with a similar approach to that of Pafig. This tool was established on recursive feature selection
• and feed-forward neural networks.
• ArchCandy
• ArchCandy (102) is based on the observation that several disease-related amyloid fibrils involve
• β-arcades as part of their tertiary structures.
33. • Approaches for predicting intrinsic aggregation hot spots,
or aggregation-prone regions (APRs).
• These approaches motivate multiple rational protein design
strategies and utilize different types of computational tools
that predict factors in the overall aggregation process.
• APR predictors focuses on publicly available, automated
hot-spot prediction algorithms to demonstrate the basic
principles thought to be significant in APR predictions and
to highlight similarities and differences among different
approaches.
• Finally, the main limitations of using conformational
stability, colloidal interactions, and APR predictors are
determined
34. • Cloning and site-directed mutagenesis.
• Periplasmic expression and purification of
TEM-1 via Osmotic shock.
• Differential scanning fluorimetry (DSF).
• Fluorescence spectroscopy.
• Enzyme inhibition