SlideShare a Scribd company logo
1 of 30
Download to read offline
Taking a walk on the W-side:
Comparing Epitopes on HIV-1
  with the W-curve & TSP.



                                   
Douglas J. Cork1,2,4, Steven Lembark3, Bruce K. Brown1,4, Victoria 
        R. Polonis1,4, Jerome Kim1,4, Nelson L. Michael5

    US Military HIV Research Program (MHRP)/Henry Jackson 
 Foundation(HJF)1, Rockville, MD., Illinois Institute of Technology2, 
 Chicago, IL., Workhorse Computing3, Woodhaven, NY., Walter Reed 
  Army Institute For Research4, Rockville, MD., Walter Reed Army 
               Institute for Research, Washington, DC5
Statistically, HIV­1 is a problem.
●   One of the major problems in studying HIV­1 is 
    the apparent randomness of clinical response.
    ●   Tests using clades based on genome sequences 
        show no correlation with immune response.
●   Part of the answer may be clades based on 
    smaller, clinically­specific sequences.
    ●   HIV­1 mutates 10,000 times faster than people.
    ●   Existing clades end up including too much white 
        noise to correlate well with anything.
The Structure of HIV­1 
●   gp120 is the 
    primary focus 
    for immune 
    studies.
●   gp120 and 
    gp41 make up 
    the envelope 
    protein, gp160.
Standard Clades vs. Neutralization Data
●   Standard clades of HIV­1 are based on 
    phylogenetic trees of the genome.
    ●   They do not correlate well with neutralization data.
    ●   Between­ and within­clade have similar variability.
    ●   Antibody and Cell studies have low correlation for 
        within­clade results.
●   Lack of a correlation prevents developing any 
    broadly neutralizing treatments.
    ●   Today we have to sequence the virus to treat it.
Example: Cross­clade neutralization shows no 
useful pattern in Peripheral Blood Mononuclear 
Cell or Pseudovirus Assay studies.
●   Bub
    ble 
    plot.
●   No 
    real 
    relati
    onsh
    ip.
Neutralization 
Heat Map
●   Distribution of 
    response to 
    antibody pools 
    lacks any 
    correlation with 
    the standard 
    clades.
HIV­1 Genetics Complicate  Analysis
●   Genes and proteins are normally reported with 
    respect to a single strain, HXB2.
    ●   Hard to compare local features between strains.
    ●   Need to re­discover them for each study.
●   Neutralization data are specific to gp120.
    ●   Variable regions in gp120 leave corresponding 
        locations in different samples off by 10's of bases.
    ●   Antibody binding sites (epitopes) are only a few 
        bases long, with a majority in the variable regions.
Another approach: W­curves
●   The W­curve is based on chaos and game 
    theory.
●   It abstracts a sequence of DNA into a three­
    dimensional structure.
    ●   Originally designed for visualization, we have now 
        adapted it for machine comparison.
●   Geometric analysis of the curves allows for 
    piecewise comparison of the sequences.
The W­curve




●   Start with a square at the origin and a discrete 
    Z­axis matching the sequence base numbers.
●   Each point moves halfway towards the corner 
    for the next base.
●   All curves 
    start at 
    (0,0,0).
●   The curve 
    (blue) 
    moves half 
    way towards 
    “C” then “G” 
    (red lines).
Autoregression
●   Converge by 
    base 7 after a 
    SNP at base­3.
●   Convergence 
    is quick even 
    after large 
    indels.
Handling Gaps




●   Curves converge as SNP's do but with a phase 
    shift.
Scoring Curves
●   Approximating the 
    distance smooths over 
    SNP's.
●   Smaller angles reduce 
    difference, large 
    angles add them.
Needle in a Haystack: CD4 Epitope
●   The CD4 epitopes occupy only a few, widely 
    dispersed locations on gp120.
●   Locating portions of the discontinuous epitope 
    is difficult.
    ●   Variable regions between them change the 
        locations between samples.
    ●   Portions of the epitope within the variable region 
        can be hidden by nearby changes.
Analyzing the 3D Structure
●   The advantage to W­curves is that even small 
    features of the gene generate unique geometry.
    ●   Features are easier to identify in 3D than the 1D 
        CATG­strings.
●   By first locating large­scale features, we can 
    search for smaller ones more easily.
    ●   First align extreme points on the curves.
    ●   Then compare regions between them.
    ●   With a library of fragments, we pick the best match.
W­curve Algorithm & Serial Comparison
●   Large­scale features guide the search for 
    smaller pieces.
    ●   Conserved regions anchor search.
    ●   After aligning 'peaks' in the curves, we align smaller 
        and less discriminating features.
    ●   A library of W­curve fragments finds best fit with 
        multiple samples.
●   Repeatable process allows examining and 
    scoring large numbers of finer features.
W­curves of HXB2 genome and gp120
●   The curve for HXB2 illustrates the most 
    important features of W­curves.
    ●   Looking at each section of the W­curve you'll notice 
        that each area is different from the others.
    ●   This is what allows us to locate small features: it is 
        easier to discern them in 3D than a character string.
●   This figure also highlights the location of gp120.
A detailed view of gp120
●   The next slide shows the first portion of HXB2's 
    env gene: gp120.
●   Again, notice that each portion of the curve is 
    distinct from the others. 
●   The different conserved (C) and variable (V) 
    regions are marked across the bottom of the 
    image.
The CD4 epitope in gp120
●   This is where the W­curve really becomes 
    useful: isolating the epitope locations within 
    gp120.
●   The highlighted areas show the epitope 
    locations with an additional 3­bases of 
    conformational region before and after (which 
    combines a few of the regions).
●   Note that the epitope is dispersed and lives 
    largely in the variable regions.
Clustering With the TSP
●   Solutions to the Traveling Salesman Problem 
    can be used to cluster genes.
    ●   The shortest path clusters more­similar sequences.
●   The difficulty is in getting clades out of the TSP.
    ●   One approach uses dummy cities with small 
        distances to all other cities.
    ●   Dummys end up in the inter­cluster regions.
●   This approach has proven fast & repeatable.
Tour­0 defines the colors for others.
Clades start to break down in gp41
C5 needs more groups.
Clades break down completely in V4
Further Work on Clusters
●   Detection.
    ●   Find algorithm for repeatably assigning the number 
        of dummy cities.
●   Comparison.
    ●   Automate detecting “similar” clusters.
●   Time­series analysis.
    ●   Watch sample groups for new members.
    ●   Track evolution of drug resistance in clinical trial 
        groups, individual patients.
Ongoing Research
●   Our goal is to correlate neutralization outcomes.
    ●   Compare small regions near the epitopes.
    ●   Find DNA that clusters similarly to neutralization 
        data.
●   DNA clusters that match the Neutralization data 
    are “clinical” clades.
    ●   Biggest issue will be deciding what “similar” is.
    ●   Probably a good application for Fuzzy Logic.
Acknowledgments
●   Thanks to the authors of Brown, et al, study.
      All of the work we've shown you was done on a 
      computer. Without fieldwork and wet labs, it would 
      be empty. Next time you sit down to crunch some 
      numbers, stop and picture for a moment the 
      process of acquiring it. You'll get a whole new 
      appreciation for your work.

More Related Content

Viewers also liked

Analisis Chaid Sebagai Alat Bantu Statistika Untuk (Vita & Dessy)
Analisis Chaid Sebagai Alat Bantu Statistika Untuk (Vita & Dessy)Analisis Chaid Sebagai Alat Bantu Statistika Untuk (Vita & Dessy)
Analisis Chaid Sebagai Alat Bantu Statistika Untuk (Vita & Dessy)arditasukma
 
Aligning seqeunces with W-curve and SQL.
Aligning seqeunces with W-curve and SQL.Aligning seqeunces with W-curve and SQL.
Aligning seqeunces with W-curve and SQL.Workhorse Computing
 
Investor Seminar in San Francisco March 7th, 2015
Investor Seminar in San Francisco March 7th, 2015Investor Seminar in San Francisco March 7th, 2015
Investor Seminar in San Francisco March 7th, 2015Joe Pryor
 
Our Friends the Utils: A highway traveled by wheels we didn't re-invent.
Our Friends the Utils: A highway traveled by wheels we didn't re-invent. Our Friends the Utils: A highway traveled by wheels we didn't re-invent.
Our Friends the Utils: A highway traveled by wheels we didn't re-invent. Workhorse Computing
 
Perly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsPerly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsWorkhorse Computing
 
Object Trampoline: Why having not the object you want is what you need.
Object Trampoline: Why having not the object you want is what you need.Object Trampoline: Why having not the object you want is what you need.
Object Trampoline: Why having not the object you want is what you need.Workhorse Computing
 
Low and No cost real estate marketing plan for Enid Oklahoma
Low and No cost real estate marketing plan for Enid OklahomaLow and No cost real estate marketing plan for Enid Oklahoma
Low and No cost real estate marketing plan for Enid OklahomaJoe Pryor
 

Viewers also liked (10)

Clustering Genes: W-curve + TSP
Clustering Genes: W-curve + TSPClustering Genes: W-curve + TSP
Clustering Genes: W-curve + TSP
 
Analisis Chaid Sebagai Alat Bantu Statistika Untuk (Vita & Dessy)
Analisis Chaid Sebagai Alat Bantu Statistika Untuk (Vita & Dessy)Analisis Chaid Sebagai Alat Bantu Statistika Untuk (Vita & Dessy)
Analisis Chaid Sebagai Alat Bantu Statistika Untuk (Vita & Dessy)
 
Aligning seqeunces with W-curve and SQL.
Aligning seqeunces with W-curve and SQL.Aligning seqeunces with W-curve and SQL.
Aligning seqeunces with W-curve and SQL.
 
Investor Seminar in San Francisco March 7th, 2015
Investor Seminar in San Francisco March 7th, 2015Investor Seminar in San Francisco March 7th, 2015
Investor Seminar in San Francisco March 7th, 2015
 
Our Friends the Utils: A highway traveled by wheels we didn't re-invent.
Our Friends the Utils: A highway traveled by wheels we didn't re-invent. Our Friends the Utils: A highway traveled by wheels we didn't re-invent.
Our Friends the Utils: A highway traveled by wheels we didn't re-invent.
 
Object Exercise
Object ExerciseObject Exercise
Object Exercise
 
Perly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsPerly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data Records
 
Digital Age 2.0 - Andrea Harrison
Digital Age 2.0 - Andrea HarrisonDigital Age 2.0 - Andrea Harrison
Digital Age 2.0 - Andrea Harrison
 
Object Trampoline: Why having not the object you want is what you need.
Object Trampoline: Why having not the object you want is what you need.Object Trampoline: Why having not the object you want is what you need.
Object Trampoline: Why having not the object you want is what you need.
 
Low and No cost real estate marketing plan for Enid Oklahoma
Low and No cost real estate marketing plan for Enid OklahomaLow and No cost real estate marketing plan for Enid Oklahoma
Low and No cost real estate marketing plan for Enid Oklahoma
 

Similar to Comparing HIV-1 Epitopes with W-curves and TSP Clusters

genome mapping
genome mappinggenome mapping
genome mappingSuresh San
 
Verifying the role of AID in Chronic Lymphocytic Leukemia
Verifying the role of AID in Chronic Lymphocytic LeukemiaVerifying the role of AID in Chronic Lymphocytic Leukemia
Verifying the role of AID in Chronic Lymphocytic LeukemiaCharlotte Broadbent
 
Random Amplified polymorphic DNA. RAPD
Random Amplified polymorphic DNA. RAPDRandom Amplified polymorphic DNA. RAPD
Random Amplified polymorphic DNA. RAPDUniversity of Mumbai
 
Arnab kumar de
Arnab kumar deArnab kumar de
Arnab kumar deArnab De
 
Mapping and quantifying transcripts.pdf
Mapping and quantifying transcripts.pdfMapping and quantifying transcripts.pdf
Mapping and quantifying transcripts.pdfKristu Jayanti College
 
Whole Genome Amplification from Single Cell
Whole Genome Amplification from Single CellWhole Genome Amplification from Single Cell
Whole Genome Amplification from Single CellQIAGEN
 
MCQs on DNA MicroArray.pdf
MCQs on DNA MicroArray.pdfMCQs on DNA MicroArray.pdf
MCQs on DNA MicroArray.pdfRajendraChavhan3
 
CpG Island Identification with Hidden Markov Models
CpG Island Identification with Hidden Markov ModelsCpG Island Identification with Hidden Markov Models
CpG Island Identification with Hidden Markov ModelsKshitij Tayal
 
Gene mapping tools
Gene mapping toolsGene mapping tools
Gene mapping toolsUsman Arshad
 
Monkeying Around: Automatically Analyzing Malaria Infections in Rhesus Macaques
Monkeying Around: Automatically Analyzing Malaria Infections in Rhesus MacaquesMonkeying Around: Automatically Analyzing Malaria Infections in Rhesus Macaques
Monkeying Around: Automatically Analyzing Malaria Infections in Rhesus MacaquesJinho Choi
 

Similar to Comparing HIV-1 Epitopes with W-curves and TSP Clusters (20)

DNA sequencing
DNA sequencing  DNA sequencing
DNA sequencing
 
Ppt snp detection
Ppt snp detectionPpt snp detection
Ppt snp detection
 
Molecular hybridization
Molecular hybridizationMolecular hybridization
Molecular hybridization
 
genome mapping
genome mappinggenome mapping
genome mapping
 
Data basics
Data basicsData basics
Data basics
 
Verifying the role of AID in Chronic Lymphocytic Leukemia
Verifying the role of AID in Chronic Lymphocytic LeukemiaVerifying the role of AID in Chronic Lymphocytic Leukemia
Verifying the role of AID in Chronic Lymphocytic Leukemia
 
DNA Fingerprinting
DNA FingerprintingDNA Fingerprinting
DNA Fingerprinting
 
Random Amplified polymorphic DNA. RAPD
Random Amplified polymorphic DNA. RAPDRandom Amplified polymorphic DNA. RAPD
Random Amplified polymorphic DNA. RAPD
 
Alignment Approaches II: Long Reads
Alignment Approaches II: Long ReadsAlignment Approaches II: Long Reads
Alignment Approaches II: Long Reads
 
Arnab kumar de
Arnab kumar deArnab kumar de
Arnab kumar de
 
Mapping and quantifying transcripts.pdf
Mapping and quantifying transcripts.pdfMapping and quantifying transcripts.pdf
Mapping and quantifying transcripts.pdf
 
Whole Genome Amplification from Single Cell
Whole Genome Amplification from Single CellWhole Genome Amplification from Single Cell
Whole Genome Amplification from Single Cell
 
Dna fingerprinting
Dna fingerprintingDna fingerprinting
Dna fingerprinting
 
MCQs on DNA MicroArray.pdf
MCQs on DNA MicroArray.pdfMCQs on DNA MicroArray.pdf
MCQs on DNA MicroArray.pdf
 
CpG Island Identification with Hidden Markov Models
CpG Island Identification with Hidden Markov ModelsCpG Island Identification with Hidden Markov Models
CpG Island Identification with Hidden Markov Models
 
Gene mapping tools
Gene mapping toolsGene mapping tools
Gene mapping tools
 
Gene sequencing
Gene sequencingGene sequencing
Gene sequencing
 
Monkeying Around: Automatically Analyzing Malaria Infections in Rhesus Macaques
Monkeying Around: Automatically Analyzing Malaria Infections in Rhesus MacaquesMonkeying Around: Automatically Analyzing Malaria Infections in Rhesus Macaques
Monkeying Around: Automatically Analyzing Malaria Infections in Rhesus Macaques
 
Design of experiments(
Design of experiments(Design of experiments(
Design of experiments(
 
Blotting techniques1
Blotting techniques1Blotting techniques1
Blotting techniques1
 

More from Workhorse Computing

Wheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility ModulesWheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility ModulesWorkhorse Computing
 
Paranormal statistics: Counting What Doesn't Add Up
Paranormal statistics: Counting What Doesn't Add UpParanormal statistics: Counting What Doesn't Add Up
Paranormal statistics: Counting What Doesn't Add UpWorkhorse Computing
 
The $path to knowledge: What little it take to unit-test Perl.
The $path to knowledge: What little it take to unit-test Perl.The $path to knowledge: What little it take to unit-test Perl.
The $path to knowledge: What little it take to unit-test Perl.Workhorse Computing
 
Generating & Querying Calendar Tables in Posgresql
Generating & Querying Calendar Tables in PosgresqlGenerating & Querying Calendar Tables in Posgresql
Generating & Querying Calendar Tables in PosgresqlWorkhorse Computing
 
Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Workhorse Computing
 
BSDM with BASH: Command Interpolation
BSDM with BASH: Command InterpolationBSDM with BASH: Command Interpolation
BSDM with BASH: Command InterpolationWorkhorse Computing
 
BASH Variables Part 1: Basic Interpolation
BASH Variables Part 1: Basic InterpolationBASH Variables Part 1: Basic Interpolation
BASH Variables Part 1: Basic InterpolationWorkhorse Computing
 
The W-curve and its application.
The W-curve and its application.The W-curve and its application.
The W-curve and its application.Workhorse Computing
 
Keeping objects healthy with Object::Exercise.
Keeping objects healthy with Object::Exercise.Keeping objects healthy with Object::Exercise.
Keeping objects healthy with Object::Exercise.Workhorse Computing
 
Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.Workhorse Computing
 
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6Workhorse Computing
 

More from Workhorse Computing (20)

Wheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility ModulesWheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility Modules
 
mro-every.pdf
mro-every.pdfmro-every.pdf
mro-every.pdf
 
Paranormal statistics: Counting What Doesn't Add Up
Paranormal statistics: Counting What Doesn't Add UpParanormal statistics: Counting What Doesn't Add Up
Paranormal statistics: Counting What Doesn't Add Up
 
The $path to knowledge: What little it take to unit-test Perl.
The $path to knowledge: What little it take to unit-test Perl.The $path to knowledge: What little it take to unit-test Perl.
The $path to knowledge: What little it take to unit-test Perl.
 
Unit Testing Lots of Perl
Unit Testing Lots of PerlUnit Testing Lots of Perl
Unit Testing Lots of Perl
 
Generating & Querying Calendar Tables in Posgresql
Generating & Querying Calendar Tables in PosgresqlGenerating & Querying Calendar Tables in Posgresql
Generating & Querying Calendar Tables in Posgresql
 
Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!
 
BSDM with BASH: Command Interpolation
BSDM with BASH: Command InterpolationBSDM with BASH: Command Interpolation
BSDM with BASH: Command Interpolation
 
Findbin libs
Findbin libsFindbin libs
Findbin libs
 
Memory Manglement in Raku
Memory Manglement in RakuMemory Manglement in Raku
Memory Manglement in Raku
 
BASH Variables Part 1: Basic Interpolation
BASH Variables Part 1: Basic InterpolationBASH Variables Part 1: Basic Interpolation
BASH Variables Part 1: Basic Interpolation
 
Effective Benchmarks
Effective BenchmarksEffective Benchmarks
Effective Benchmarks
 
Metadata-driven Testing
Metadata-driven TestingMetadata-driven Testing
Metadata-driven Testing
 
The W-curve and its application.
The W-curve and its application.The W-curve and its application.
The W-curve and its application.
 
Keeping objects healthy with Object::Exercise.
Keeping objects healthy with Object::Exercise.Keeping objects healthy with Object::Exercise.
Keeping objects healthy with Object::Exercise.
 
Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.
 
Smoking docker
Smoking dockerSmoking docker
Smoking docker
 
Getting Testy With Perl6
Getting Testy With Perl6Getting Testy With Perl6
Getting Testy With Perl6
 
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
 
Neatly folding-a-tree
Neatly folding-a-treeNeatly folding-a-tree
Neatly folding-a-tree
 

Comparing HIV-1 Epitopes with W-curves and TSP Clusters

  • 1. Taking a walk on the W-side: Comparing Epitopes on HIV-1 with the W-curve & TSP.   Douglas J. Cork1,2,4, Steven Lembark3, Bruce K. Brown1,4, Victoria  R. Polonis1,4, Jerome Kim1,4, Nelson L. Michael5 US Military HIV Research Program (MHRP)/Henry Jackson  Foundation(HJF)1, Rockville, MD., Illinois Institute of Technology2,  Chicago, IL., Workhorse Computing3, Woodhaven, NY., Walter Reed  Army Institute For Research4, Rockville, MD., Walter Reed Army  Institute for Research, Washington, DC5
  • 2. Statistically, HIV­1 is a problem. ● One of the major problems in studying HIV­1 is  the apparent randomness of clinical response. ● Tests using clades based on genome sequences  show no correlation with immune response. ● Part of the answer may be clades based on  smaller, clinically­specific sequences. ● HIV­1 mutates 10,000 times faster than people. ● Existing clades end up including too much white  noise to correlate well with anything.
  • 3. The Structure of HIV­1  ● gp120 is the  primary focus  for immune  studies. ● gp120 and  gp41 make up  the envelope  protein, gp160.
  • 4. Standard Clades vs. Neutralization Data ● Standard clades of HIV­1 are based on  phylogenetic trees of the genome. ● They do not correlate well with neutralization data. ● Between­ and within­clade have similar variability. ● Antibody and Cell studies have low correlation for  within­clade results. ● Lack of a correlation prevents developing any  broadly neutralizing treatments. ● Today we have to sequence the virus to treat it.
  • 6. Neutralization  Heat Map ● Distribution of  response to  antibody pools  lacks any  correlation with  the standard  clades.
  • 7. HIV­1 Genetics Complicate  Analysis ● Genes and proteins are normally reported with  respect to a single strain, HXB2. ● Hard to compare local features between strains. ● Need to re­discover them for each study. ● Neutralization data are specific to gp120. ● Variable regions in gp120 leave corresponding  locations in different samples off by 10's of bases. ● Antibody binding sites (epitopes) are only a few  bases long, with a majority in the variable regions.
  • 8. Another approach: W­curves ● The W­curve is based on chaos and game  theory. ● It abstracts a sequence of DNA into a three­ dimensional structure. ● Originally designed for visualization, we have now  adapted it for machine comparison. ● Geometric analysis of the curves allows for  piecewise comparison of the sequences.
  • 9. The W­curve ● Start with a square at the origin and a discrete  Z­axis matching the sequence base numbers. ● Each point moves halfway towards the corner  for the next base.
  • 10. All curves  start at  (0,0,0). ● The curve  (blue)  moves half  way towards  “C” then “G”  (red lines).
  • 11. Autoregression ● Converge by  base 7 after a  SNP at base­3. ● Convergence  is quick even  after large  indels.
  • 12. Handling Gaps ● Curves converge as SNP's do but with a phase  shift.
  • 13. Scoring Curves ● Approximating the  distance smooths over  SNP's. ● Smaller angles reduce  difference, large  angles add them.
  • 14. Needle in a Haystack: CD4 Epitope ● The CD4 epitopes occupy only a few, widely  dispersed locations on gp120. ● Locating portions of the discontinuous epitope  is difficult. ● Variable regions between them change the  locations between samples. ● Portions of the epitope within the variable region  can be hidden by nearby changes.
  • 15. Analyzing the 3D Structure ● The advantage to W­curves is that even small  features of the gene generate unique geometry. ● Features are easier to identify in 3D than the 1D  CATG­strings. ● By first locating large­scale features, we can  search for smaller ones more easily. ● First align extreme points on the curves. ● Then compare regions between them. ● With a library of fragments, we pick the best match.
  • 16. W­curve Algorithm & Serial Comparison ● Large­scale features guide the search for  smaller pieces. ● Conserved regions anchor search. ● After aligning 'peaks' in the curves, we align smaller  and less discriminating features. ● A library of W­curve fragments finds best fit with  multiple samples. ● Repeatable process allows examining and  scoring large numbers of finer features.
  • 17. W­curves of HXB2 genome and gp120 ● The curve for HXB2 illustrates the most  important features of W­curves. ● Looking at each section of the W­curve you'll notice  that each area is different from the others. ● This is what allows us to locate small features: it is  easier to discern them in 3D than a character string. ● This figure also highlights the location of gp120.
  • 18.
  • 19. A detailed view of gp120 ● The next slide shows the first portion of HXB2's  env gene: gp120. ● Again, notice that each portion of the curve is  distinct from the others.  ● The different conserved (C) and variable (V)  regions are marked across the bottom of the  image.
  • 20.
  • 21. The CD4 epitope in gp120 ● This is where the W­curve really becomes  useful: isolating the epitope locations within  gp120. ● The highlighted areas show the epitope  locations with an additional 3­bases of  conformational region before and after (which  combines a few of the regions). ● Note that the epitope is dispersed and lives  largely in the variable regions.
  • 22.
  • 23. Clustering With the TSP ● Solutions to the Traveling Salesman Problem  can be used to cluster genes. ● The shortest path clusters more­similar sequences. ● The difficulty is in getting clades out of the TSP. ● One approach uses dummy cities with small  distances to all other cities. ● Dummys end up in the inter­cluster regions. ● This approach has proven fast & repeatable.
  • 28. Further Work on Clusters ● Detection. ● Find algorithm for repeatably assigning the number  of dummy cities. ● Comparison. ● Automate detecting “similar” clusters. ● Time­series analysis. ● Watch sample groups for new members. ● Track evolution of drug resistance in clinical trial  groups, individual patients.
  • 29. Ongoing Research ● Our goal is to correlate neutralization outcomes. ● Compare small regions near the epitopes. ● Find DNA that clusters similarly to neutralization  data. ● DNA clusters that match the Neutralization data  are “clinical” clades. ● Biggest issue will be deciding what “similar” is. ● Probably a good application for Fuzzy Logic.
  • 30. Acknowledgments ● Thanks to the authors of Brown, et al, study. All of the work we've shown you was done on a  computer. Without fieldwork and wet labs, it would  be empty. Next time you sit down to crunch some  numbers, stop and picture for a moment the  process of acquiring it. You'll get a whole new  appreciation for your work.