Comparing HIV-1 Epitopes with W-curves and TSP Clusters

Taking a walk on the W-side:
Comparing Epitopes on HIV-1
with the W-curve & TSP.

Douglas J. Cork1,2,4, Steven Lembark3, Bruce K. Brown1,4, Victoria
R. Polonis1,4, Jerome Kim1,4, Nelson L. Michael5

US Military HIV Research Program (MHRP)/Henry Jackson
Foundation(HJF)1, Rockville, MD., Illinois Institute of Technology2,
Chicago, IL., Workhorse Computing3, Woodhaven, NY., Walter Reed
Army Institute For Research4, Rockville, MD., Walter Reed Army
Institute for Research, Washington, DC5

Statistically, HIV1 is a problem.
● One of the major problems in studying HIV1 is
the apparent randomness of clinical response.
● Tests using clades based on genome sequences
show no correlation with immune response.
● Part of the answer may be clades based on
smaller, clinicallyspecific sequences.
● HIV1 mutates 10,000 times faster than people.
● Existing clades end up including too much white
noise to correlate well with anything.

The Structure of HIV1
● gp120 is the
primary focus
for immune
studies.
● gp120 and
gp41 make up
the envelope
protein, gp160.

Standard Clades vs. Neutralization Data
● Standard clades of HIV1 are based on
phylogenetic trees of the genome.
● They do not correlate well with neutralization data.
● Between and withinclade have similar variability.
● Antibody and Cell studies have low correlation for
withinclade results.
● Lack of a correlation prevents developing any
broadly neutralizing treatments.
● Today we have to sequence the virus to treat it.

Example: Crossclade neutralization shows no
useful pattern in Peripheral Blood Mononuclear
Cell or Pseudovirus Assay studies.
● Bub
ble
plot.
● No
real
relati
onsh
ip.

Neutralization
Heat Map
● Distribution of
response to
antibody pools
lacks any
correlation with
the standard
clades.

HIV1 Genetics Complicate Analysis
● Genes and proteins are normally reported with
respect to a single strain, HXB2.
● Hard to compare local features between strains.
● Need to rediscover them for each study.
● Neutralization data are specific to gp120.
● Variable regions in gp120 leave corresponding
locations in different samples off by 10's of bases.
● Antibody binding sites (epitopes) are only a few
bases long, with a majority in the variable regions.

Another approach: Wcurves
● The Wcurve is based on chaos and game
theory.
● It abstracts a sequence of DNA into a three
dimensional structure.
● Originally designed for visualization, we have now
adapted it for machine comparison.
● Geometric analysis of the curves allows for
piecewise comparison of the sequences.

The Wcurve

● Start with a square at the origin and a discrete
Zaxis matching the sequence base numbers.
● Each point moves halfway towards the corner
for the next base.

● All curves
start at
(0,0,0).
● The curve
(blue)
moves half
way towards
“C” then “G”
(red lines).

Autoregression
● Converge by
base 7 after a
SNP at base3.
● Convergence
is quick even
after large
indels.

Handling Gaps

● Curves converge as SNP's do but with a phase
shift.

Scoring Curves
● Approximating the
distance smooths over
SNP's.
● Smaller angles reduce
difference, large
angles add them.

Needle in a Haystack: CD4 Epitope
● The CD4 epitopes occupy only a few, widely
dispersed locations on gp120.
● Locating portions of the discontinuous epitope
is difficult.
● Variable regions between them change the
locations between samples.
● Portions of the epitope within the variable region
can be hidden by nearby changes.

Analyzing the 3D Structure
● The advantage to Wcurves is that even small
features of the gene generate unique geometry.
● Features are easier to identify in 3D than the 1D
CATGstrings.
● By first locating largescale features, we can
search for smaller ones more easily.
● First align extreme points on the curves.
● Then compare regions between them.
● With a library of fragments, we pick the best match.

Wcurve Algorithm & Serial Comparison
● Largescale features guide the search for
smaller pieces.
● Conserved regions anchor search.
● After aligning 'peaks' in the curves, we align smaller
and less discriminating features.
● A library of Wcurve fragments finds best fit with
multiple samples.
● Repeatable process allows examining and
scoring large numbers of finer features.

Wcurves of HXB2 genome and gp120
● The curve for HXB2 illustrates the most
important features of Wcurves.
● Looking at each section of the Wcurve you'll notice
that each area is different from the others.
● This is what allows us to locate small features: it is
easier to discern them in 3D than a character string.
● This figure also highlights the location of gp120.

A detailed view of gp120
● The next slide shows the first portion of HXB2's
env gene: gp120.
● Again, notice that each portion of the curve is
distinct from the others.
● The different conserved (C) and variable (V)
regions are marked across the bottom of the
image.

The CD4 epitope in gp120
● This is where the Wcurve really becomes
useful: isolating the epitope locations within
gp120.
● The highlighted areas show the epitope
locations with an additional 3bases of
conformational region before and after (which
combines a few of the regions).
● Note that the epitope is dispersed and lives
largely in the variable regions.

Clustering With the TSP
● Solutions to the Traveling Salesman Problem
can be used to cluster genes.
● The shortest path clusters moresimilar sequences.
● The difficulty is in getting clades out of the TSP.
● One approach uses dummy cities with small
distances to all other cities.
● Dummys end up in the intercluster regions.
● This approach has proven fast & repeatable.

Tour0 defines the colors for others.

Clades start to break down in gp41

Clades break down completely in V4

Further Work on Clusters
● Detection.
● Find algorithm for repeatably assigning the number
of dummy cities.
● Comparison.
● Automate detecting “similar” clusters.
● Timeseries analysis.
● Watch sample groups for new members.
● Track evolution of drug resistance in clinical trial
groups, individual patients.

Ongoing Research
● Our goal is to correlate neutralization outcomes.
● Compare small regions near the epitopes.
● Find DNA that clusters similarly to neutralization
data.
● DNA clusters that match the Neutralization data
are “clinical” clades.
● Biggest issue will be deciding what “similar” is.
● Probably a good application for Fuzzy Logic.

Acknowledgments
● Thanks to the authors of Brown, et al, study.
All of the work we've shown you was done on a
computer. Without fieldwork and wet labs, it would
be empty. Next time you sit down to crunch some
numbers, stop and picture for a moment the
process of acquiring it. You'll get a whole new
appreciation for your work.

Comparing HIV-1 Epitopes with W-curves and TSP Clusters

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (10)

Similar to Comparing HIV-1 Epitopes with W-curves and TSP Clusters

Similar to Comparing HIV-1 Epitopes with W-curves and TSP Clusters (20)

More from Workhorse Computing

More from Workhorse Computing (20)

Comparing HIV-1 Epitopes with W-curves and TSP Clusters