SlideShare uma empresa Scribd logo
1 de 12
Baixar para ler offline
SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage
Disequilibrium and Haplotype Analysis

Goal: This tutorial introduces several websites and tools useful for determining linkage
disequilibrium for your gene or region of interest and tagSNP selection. In this section,
you will cover the following topics.
    • SeattleSNPs website tools
            o Visual Genotype (VG2)
            o Visual Haplotype (VH1)
    • TagSNP selection tools
            o LDSelect
            o HaploBlockFinder
            o Haploview

Part 1. Using Visual Genotype (VG2) in SeattleSNPs

   1. Go to http://pga.gs.washington.edu
   2. Find VG2 under “Software” (left-hand grey -colored panel of website).
   3. Choose a population (European-descent) using the pull-down menu for “PGA
      Finished Gene PopulationFilter.”
   4. Choose the gene MCP re-sequenced by SeattleSNPs using the pull-down menu
      for “PGA Finished Gene Prettybase Input File.”
   5. Enter “5” in “Rare Allele Percentage (integer, 0 to 50).” This filter allows you to
      display only common SNPs (5% minor allele frequency) for MCP. Note that the
      list of SNPs at a specific MAF will depend on the population you choose in step
      #3.
   6. Click on “Run VG2 on the Web!.”
   7. This will return an image of the genotypes for MCP in the European-American
      sample in a pop-up window. The numbers at the top of the image represent the
      SNPs (numbered along a reference sequence used in re-sequencing the gene).
      The numbers on the left side of the image represent the sample ID. Each square
      represents an individual sample’s genotype: homozygous for the common allele
      (blue), heterozygous (red), and homozygous for the rare allele (yellow).
   8. To save the image to your computer, right-click on the image and choose “save
      as.”




                                     Page 1 of 12
Part 2. Linkage Disequilibrium (LD) Using VG2 in SeattleSNPs

   1. With steps 1 through 5 completed from above.
   2. Now Choose the LD statistic (r2) using the pull-down menu “Linkage
      Disequilibrium Plot.”
   3. Choose the color of the LD plot (rainbow) using the pull-down menu “LD Shader
      Mode”.
   4. Click on “Run VG2 on the Web!.”
   5. You should have an image of the genotypes and an LD plot appearing in a pop-up
      window.
   6. To save the image to your computer, right-click on the image and choose “save
      as.”
   7. The defaults for LD min and max are 0.5 to 1 but you can change this parameter
      to 0 to 1. Try this option and then again run the default option.




                                  Page 2 of 12
Part 3. Clustering and TagSNP Selection (LDSelect) Using VG2 in SeattleSNPs

   1. With the completed steps 1 through 5 listed above in “Using Visual Genotype
      (VG2).” (You can leave out the LD plots from Part 2 if you like).
   2. To sort the SNPs into clusters of related sites, go to the pull-down menu “Cluster
      and/or Draw Trees For” and choose “SITE.” If you left the setting from Part 2 you
      will receive both a visual genotype and an LD plot clustered by SNP relatedness.
   3. To pick tagSNPs using LDSelect, go to the pull-down menu “Cluster and/or Draw
      Trees For” and choose “LDSelect” (The default for this is r2 > 0.64. A more
      interactive software package is under development). You will now have a visual
      genotype with SNPs clustered into bins. Each bin is denoted by a line over the
      SNPs included in that bin. A “*” over the SNP indicates that the SNP is a
      “tagSNP.” Only one tagSNP per bin is required to represent the genetic diversity
      of that bin (NOTE: This is different than other tagSNP algorithms based on
      haplotypes). If a tagSNP is not in a bin, it must be genotyped directly because no
      other SNP will serve as a sufficient proxy.
   Questions: How many bins are in MCP for the European-American population at
   MAF>5%? How many tagSNPs must be genotyped directly because they are not
   contained within a bin with another SNP? How many tagSNPs must be genotyped in
   a European-American population?




                                   Page 3 of 12
Page 4 of 12
Part 4. Using Visual Haplotype (VH1) for Haplotype tagSNP Selection in
SeattleSNPs

   1. Go to http://pga.gs.washington.edu
   2. Find VH1 under “Software” (left-hand grey-colored panel of website). This
      software has a interface similar to VG2 but will display haplotypes. Haplotypes
      represent the alleles of each SNP assigned to an individuals chromosomes. Each
      individual has two chromosomes representing the maternal and paternal
      chromosomes inherited from his or her parents. The visual haplotype will be
      twice as long as the visual genotype because now each individual is represented
      by two rows of data (haplotypes) instead of just one row of data (genotypes).
      NOTE: Be aware that a proportion of the genes re-sequenced by SeattleSNPs are
      X-linked. In this situation, males have one X chromosome and females have two
      X chromosomes.
   3. Choose a population (European-descent) using the pull-down menu for “PGA
      Finished Gene PopulationFilter.”
   4. Choose the gene CRP re-sequenced by SeattleSNPs using the pull-down menu for
      “PGA Finished Gene Prettybase Input File.”
   5. To pick tagSNPs to represent common genetic variation, we suggest you filter by
      minor allele frequency for common SNPs. Enter 10 in “Rare Allele Percentage



                                  Page 5 of 12
(integer, 0 to 50).” Note that the list of SNPs at a specific MAF will depend on
      the population you choose in step #3.
   6. Under haplotype sorting, choose “haplotype by frequency.”
   7. To identify SNPs in haplotypes that are correlated (or contained in a “block”), sort
      by site. At ‘”Cluster By:” choose “SITE.”
   8. Click on “Run VH1 on the Web!”
   9. You should have an image of the haplotypes in a pop-up window. The numbers
      at the top of the image represent the SNPs (numbered along a reference sequence
      used in re-sequencing the gene). The SNPs here are sorted according to site
      relatedness. The numbers on the side of the image represent the sample ID. Each
      square represents an individual sample’s allele: common (blue) and rare (yellow)
      allele. Each row represents the individual sample’s haplotype, and each
      individual will have two rows representing the two chromosomes. You can
      identify “blocks” manually using VH1.
   Questions: How many haplotypes do you have? How many tagSNPs would you
   genotype?




Part 5. Where to Find Haplotypes in SeattleSNPs

   1. In addition to VH1, we offer PHASEv2.0 output for each of our genes re-
      sequenced on the SeattleSNPs website. On the home page, click on “Genes
      Sequenced for SNPs” (left side).
   2. Choose MCP.
   3. PHASE output is found in the “Haplotyping Data” section of the gene’s web
      page. We also offer a static image of the haplotypes in this section. To
      manipulate this image, use VH1 under software and create a new haplotype image
      for your gene of interest.




                                    Page 6 of 12
Part 6. Downloading Genotype Data from HapMap

   1. Go to http://www.hapmap.org
   2. Click on “Generic Genome Browser” on left side of website.
   3. In “Landmark or Region” field, type “membrane cofactor protein” for MCP.
   4. Click on the first of the five entries for MCP (4179). You should see the gene
      MCP with 15 genotyped SNPs (denoted by little pie charts symbolizing the allele
      frequency for each population sample genotyped).
   5. Scroll down to the bottom of the web page and look for “Dumps, Searches and
      other Operations.” Choose “Dump SNP genotype data” from the pull-down
      menu.
   6. Click on “Configure.” Choose a population (CEU is CEPH or European-descent).
      The click on “Save to Disk.” Alternatively, you can click on “Open directly in
      HaploView” if you have Haploview loaded on your computer. Click “Go.”

   Part 7. Using HapMap Data in Haploview

   1. Download and install Haploview 3.2 from
       http://www.broad.mit.edu/mpg/haploview/index.php
   2. Open Haploview. Click on “Load HapMap data.” Load the file you saved in the
       previous section (Downloading Genotype Data from HapMap). If you are
       connected to the internet, click on “Download and show HapMap info track?”
       “Click “OK.” Alternatively, if you did not save the file from the previous section,
       you can download the file “mcp_hapmap.txt” from
       http://pga.gs.washington.edu/wustl/data_files/datafiles.html Load this file onto
       Haploview and click “OK.”
   3. The first view of the data is the “check markers” window. This provides a nice
       summary of the marker data, including name of the markers, genomic position of
       the markers, observed heterozygosity, predicted heterozygosity, Hardy Weinberg,
       % samples successfully genotyped, the number of fully genotyped family trios for
       each marker, the number of Mendelian inheritance errors, minor allele frequency,
       and pass/fail quality control for each marker. Two markers fail (denoted in red)
       in the MCP dataset because they are monomorphic in the samples genotyped (no
       heterozygotes or homozygotes for the rare allele).
   4. Haploview offers a visual of the LD statistic. Click on the “LD” tab. You can
       change haplotype block definitions by going to “Analysis” and select the block
       definition. The default is the block definition by Gabriel et al in Science (2002).
       To change the LD statistic, click on “Display” and select the statistic of your
       choice. For this example, choose “four-gamete rule”.
   Questions: How many blocks are in MCP for the European-descent population using
   the default block definition in Haploview? How does changing the definition change
   the block structure in MCP?
   5. By default, if the LD statistic is 1.0 for a particular marker pair, the number 1.0 is
       not shown in the figure. Any LD statistic less than 1.0 is shown in the figure.
       Right-click on the “95” square. This pop-up will give you statistics related to the
       pair of markers used in calculating this LD statistic.



                                     Page 7 of 12
Questions: How far apart are these two markers physically?
   6. For haplotypes, click on the “Haplotypes” tab. Haplotype frequencies are
      displayed on the right side of each haplotype. The triangles above the haplotypes
      denote the haplotype tagging SNPs. In cases of complex haplotypes (not shown
      here for the HapMap data for MCP), there will be lines connecting haplotypes,
      denoting their relationship to one another. Also, there will be a multiallelic D′
      value.
   Questions: How many haplotypes were identified in this dataset? How many
   haplotype tagging SNPs were identified?
   7. For the minimal set of tagSNPs, go to the “Tagger” tab. You can choose the
      algorithm used to define tagSNPs. For this example, choose “pairwise tagging
      only”. Then click “Run Tagger.” The results are displayed so that the tagSNPs
      are on the left of the screen. The right side of the screen shows which SNPs are
      being tagged by other SNPs.
   Questions: Using “Tagger,” how many Haploview tagSNPs are in MCP for the
   European-descent HapMap data?




Part 8. Using SeattleSNPs Data in Haploview

   1. When loading the SeattleSNPs genotypes for MCP for this exercise, click on
      “load phased genotypes.”
   2. Download “mcpxx.haploview_input.txt” and “mcpxx_locus_info.txt” from
      http://pga.gs.washington.edu/wustl/data_files/datafiles.html. The input file here is


                                    Page 8 of 12
MCP haplotype data (using PHASEv2.1) for European-Americans. Load this file
      onto Haploview and click “OK.” Repeat steps 4, 5, 6, and 7 from “Using
      HapMap Data in Haploview.” Note the difference between complete variation
      data and sampled variation data.
   Questions: How many tagSNPs are identified in complete variation data for MCP?

Part 9. HaploBlockFinder

   1. HaploBlockFinder accepts several formats, including PHASE. Download a
       PHASE output file from
       http://pga.gs.washington.edu/wustl/data_files/datafiles.html called
       “mcpxx.ED.phase.out”. Also, download the marker file “mcpxx_locus_info.txt”.
   2. Go to http://cgi.uc.edu/cgi-bin/kzhang/haploBlockFinder.cgi
   3. Enter the path of the PHASE file in the “Load haplotype file” field. Also, enter
       the path of the marker file in the “Load locus information file” field.
   4. Choose the block definition from the pull-down menu. For this example, use the
       “four gamete test.”
   5. At “Minor allele frequency lower bound,” specify a MAF 5% (0.05 in the field).
       For many genes, the program will not run on the website with all SNPs. To use
       all SNPs, you will have to download the program onto your computer and run the
       program locally.
   6. Choose “Yes” for “Show LD matrix” and “Get tagging SNPs.”
   7. Click on “Go.”
   8. It may take a while to receive results, which is most likely due to the tagging SNP
       option. The results are returned zipped, so they must be saved and unzipped
       before you can open them. You should have several text files, which can be
       opened in Excel, and two .png files. The .png files are the LD matrix output and
       the visualization of the block structure for your gene of interest. The text files list
       the blocks and tagSNPs required to represent the blocks. Note that all the
       identified tagSNPs per block are required in an association study (unlike
       LDSelect where only one tagSNP per bin is required).
   Questions: How many blocks did HaploBlockFinder identify? How many tagSNPs
   did HaploBlockFinder identify?




                                      Page 9 of 12
Part 10. TagSNP Selection Using LDSelect in Perl

   1. You need an operating system with Perl 5.0 or above installed. For Windows
      users, you probably need to download Perl from www.perl.com For Mac OS X
      users, you can use Perl when you open a new terminal window.
   2. Download the LDSelect Perl code from
      http://droog.gs.washington.edu/ldSelect.html by right clicking (Windows/Linux)
      or control-clicking (OSX) on the Download Link "ldSelect.pl" and choosing
      “Download link to disk”. For Mac OS X users, you will then have to change the
      permission of the file to an executable file. To do so, open a Unix command
      prompt (e.g. using the terminal or X11 program), navigate to the directory where
      you downloaded the script, and type “chmod –x ldselect.pl”.
   3. You also need genotype data in a "prettybase" format. For this workshop, we
      have data for the gene MCP for European-Americans available in several formats.
      Go to http://pga.gs.washington.edu/wustl/data_files/datafiles.html and download
      "mcpxx.prettybase.txt.ED.txt"
   4. At the command line, type "perl ldselect.pl -pb mcpxx.prettybase.txt.ED.txt" and
      enter.
   5. If you want to save the output, type "perl ldselect.pl -pb
      mcpxx.prettybase.txt.ED.txt > output.txt". To view the output, type "more
      output.txt".


                                  Page 10 of 12
6. If you want to know where the tagSNP is in relation to the gene, you need a SNP
      context file. Download "mcpxx.context.txt" from
      http://pga.gs.washington.edu/wustl/data_files/datafiles.html. Then, type "perl
      ldselect.pl -pb mcpxx.prettybase.txt.ED.txt -context mcpxx.context.txt >
      output.txt". The output file will now have the tagSNPs and other SNPs labeled
      with two letters. The first letter indicates if the SNP is in a unique region (U) or
      repeat region (R). The second letter indicates if the SNP is nonsynonymous (N),
      synonymous (S), intronic (I), UTR (T) or flanking (F).
   7. If you want a specific minor allele frequency, use the -freq flag. For example, for
      MAF >5%, type "perl ldselect.pl -pb mcpxx.prettybase.txt.ED.txt -freq 0.05 >
      output.txt". The default r2 threshold is 0.64. To change it, use the -r2 flag.
      Increasing the r2 threshold will increase the number of tagSNPs identified
      through LDSelect depending on the linkage disequilibrium in the genomic region
      of interest.

Answers to Questions:

How many bins are in MCP for the European-American population at MAF>5%? 7 bins
with >1 SNP; 5 “bins” with only one SNP.

How many tagSNPs must be genotyped directly because they are not contained within a
bin with another SNP? 5

How many tagSNPs must be genotyped in a European-American population? 12

How many haplotypes do you have? 4

How many tagSNPs would you genotype? 3- 4

How many blocks are in MCP for the European-descent population using the default
block definition in Haploview? One

How does changing the definition change the block structure in MCP? Changing the
definition causes the block boundary to change in MCP.

How far apart are these two markers physically? 3.1kb

How many haplotypes were identified in this dataset? 7

How many haplotype tagging SNPs were identified? 6

Using “Tagger,” how many Haploview tagSNPs are in MCP for the European-descent
HapMap data? 6

How many tagSNPs are identified in complete variation data for MCP in a European-
descent sample? 31



                                    Page 11 of 12
How many blocks did HaploBlockFinder identify? 9

How many tagSNPs did HaploBlockFinder identify? 21




                                Page 12 of 12

Mais conteúdo relacionado

Semelhante a selection_linkage_tutorial

MYSTERY MOLECULE PROJECT, PART II(20 points total)BIO1001.docx
MYSTERY MOLECULE PROJECT, PART II(20 points total)BIO1001.docxMYSTERY MOLECULE PROJECT, PART II(20 points total)BIO1001.docx
MYSTERY MOLECULE PROJECT, PART II(20 points total)BIO1001.docxdohertyjoetta
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_PresentationToyin23
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3GenomeInABottle
 
Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE
Pathema Burkholderia Annotation Jamboree: A Guide to MANATEEPathema Burkholderia Annotation Jamboree: A Guide to MANATEE
Pathema Burkholderia Annotation Jamboree: A Guide to MANATEEPathema
 
This lab has two parts – please answer all parts.Lab 7 Biotechn.docx
This lab has two parts – please answer all parts.Lab 7 Biotechn.docxThis lab has two parts – please answer all parts.Lab 7 Biotechn.docx
This lab has two parts – please answer all parts.Lab 7 Biotechn.docxglennf2
 
ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfPaul Gardner
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
 
Genome-wide Association Study (GWAS) Analysis Guide in TASSEL Software (GUI).pdf
Genome-wide Association Study (GWAS) Analysis Guide in TASSEL Software (GUI).pdfGenome-wide Association Study (GWAS) Analysis Guide in TASSEL Software (GUI).pdf
Genome-wide Association Study (GWAS) Analysis Guide in TASSEL Software (GUI).pdfRezaDystaSatria
 
Session ii g1 lab genomics and gene expression mmc-corr
Session ii g1 lab genomics and gene expression mmc-corrSession ii g1 lab genomics and gene expression mmc-corr
Session ii g1 lab genomics and gene expression mmc-corrUSD Bioinformatics
 
Selective Mapping Tutorial
Selective Mapping TutorialSelective Mapping Tutorial
Selective Mapping Tutorialheathermerk
 
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Reid Robison
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysismikaelhuss
 
Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...William Chow
 
Cameron_Locker_variants_final_poster1
Cameron_Locker_variants_final_poster1Cameron_Locker_variants_final_poster1
Cameron_Locker_variants_final_poster1Cameron Locker, MPH
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 

Semelhante a selection_linkage_tutorial (20)

Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic Algorithm
 
MYSTERY MOLECULE PROJECT, PART II(20 points total)BIO1001.docx
MYSTERY MOLECULE PROJECT, PART II(20 points total)BIO1001.docxMYSTERY MOLECULE PROJECT, PART II(20 points total)BIO1001.docx
MYSTERY MOLECULE PROJECT, PART II(20 points total)BIO1001.docx
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
 
Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE
Pathema Burkholderia Annotation Jamboree: A Guide to MANATEEPathema Burkholderia Annotation Jamboree: A Guide to MANATEE
Pathema Burkholderia Annotation Jamboree: A Guide to MANATEE
 
Biohackathon2016
Biohackathon2016Biohackathon2016
Biohackathon2016
 
This lab has two parts – please answer all parts.Lab 7 Biotechn.docx
This lab has two parts – please answer all parts.Lab 7 Biotechn.docxThis lab has two parts – please answer all parts.Lab 7 Biotechn.docx
This lab has two parts – please answer all parts.Lab 7 Biotechn.docx
 
ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdf
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
SNP
SNPSNP
SNP
 
Genome-wide Association Study (GWAS) Analysis Guide in TASSEL Software (GUI).pdf
Genome-wide Association Study (GWAS) Analysis Guide in TASSEL Software (GUI).pdfGenome-wide Association Study (GWAS) Analysis Guide in TASSEL Software (GUI).pdf
Genome-wide Association Study (GWAS) Analysis Guide in TASSEL Software (GUI).pdf
 
Session ii g1 lab genomics and gene expression mmc-corr
Session ii g1 lab genomics and gene expression mmc-corrSession ii g1 lab genomics and gene expression mmc-corr
Session ii g1 lab genomics and gene expression mmc-corr
 
Genome comparision
Genome comparisionGenome comparision
Genome comparision
 
Selective Mapping Tutorial
Selective Mapping TutorialSelective Mapping Tutorial
Selective Mapping Tutorial
 
Bioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmmBioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmm
 
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...
 
Cameron_Locker_variants_final_poster1
Cameron_Locker_variants_final_poster1Cameron_Locker_variants_final_poster1
Cameron_Locker_variants_final_poster1
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 

Mais de tutorialsruby

<img src="../i/r_14.png" />
<img src="../i/r_14.png" /><img src="../i/r_14.png" />
<img src="../i/r_14.png" />tutorialsruby
 
TopStyle Help & <b>Tutorial</b>
TopStyle Help & <b>Tutorial</b>TopStyle Help & <b>Tutorial</b>
TopStyle Help & <b>Tutorial</b>tutorialsruby
 
The Art Institute of Atlanta IMD 210 Fundamentals of Scripting <b>...</b>
The Art Institute of Atlanta IMD 210 Fundamentals of Scripting <b>...</b>The Art Institute of Atlanta IMD 210 Fundamentals of Scripting <b>...</b>
The Art Institute of Atlanta IMD 210 Fundamentals of Scripting <b>...</b>tutorialsruby
 
<img src="../i/r_14.png" />
<img src="../i/r_14.png" /><img src="../i/r_14.png" />
<img src="../i/r_14.png" />tutorialsruby
 
<img src="../i/r_14.png" />
<img src="../i/r_14.png" /><img src="../i/r_14.png" />
<img src="../i/r_14.png" />tutorialsruby
 
Standardization and Knowledge Transfer – INS0
Standardization and Knowledge Transfer – INS0Standardization and Knowledge Transfer – INS0
Standardization and Knowledge Transfer – INS0tutorialsruby
 
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa0602690047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269tutorialsruby
 
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa0602690047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269tutorialsruby
 
BloggingWithStyle_2008
BloggingWithStyle_2008BloggingWithStyle_2008
BloggingWithStyle_2008tutorialsruby
 
BloggingWithStyle_2008
BloggingWithStyle_2008BloggingWithStyle_2008
BloggingWithStyle_2008tutorialsruby
 
cascadingstylesheets
cascadingstylesheetscascadingstylesheets
cascadingstylesheetstutorialsruby
 
cascadingstylesheets
cascadingstylesheetscascadingstylesheets
cascadingstylesheetstutorialsruby
 

Mais de tutorialsruby (20)

<img src="../i/r_14.png" />
<img src="../i/r_14.png" /><img src="../i/r_14.png" />
<img src="../i/r_14.png" />
 
TopStyle Help & <b>Tutorial</b>
TopStyle Help & <b>Tutorial</b>TopStyle Help & <b>Tutorial</b>
TopStyle Help & <b>Tutorial</b>
 
The Art Institute of Atlanta IMD 210 Fundamentals of Scripting <b>...</b>
The Art Institute of Atlanta IMD 210 Fundamentals of Scripting <b>...</b>The Art Institute of Atlanta IMD 210 Fundamentals of Scripting <b>...</b>
The Art Institute of Atlanta IMD 210 Fundamentals of Scripting <b>...</b>
 
<img src="../i/r_14.png" />
<img src="../i/r_14.png" /><img src="../i/r_14.png" />
<img src="../i/r_14.png" />
 
<img src="../i/r_14.png" />
<img src="../i/r_14.png" /><img src="../i/r_14.png" />
<img src="../i/r_14.png" />
 
Standardization and Knowledge Transfer – INS0
Standardization and Knowledge Transfer – INS0Standardization and Knowledge Transfer – INS0
Standardization and Knowledge Transfer – INS0
 
xhtml_basics
xhtml_basicsxhtml_basics
xhtml_basics
 
xhtml_basics
xhtml_basicsxhtml_basics
xhtml_basics
 
xhtml-documentation
xhtml-documentationxhtml-documentation
xhtml-documentation
 
xhtml-documentation
xhtml-documentationxhtml-documentation
xhtml-documentation
 
CSS
CSSCSS
CSS
 
CSS
CSSCSS
CSS
 
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa0602690047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
 
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa0602690047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
 
HowTo_CSS
HowTo_CSSHowTo_CSS
HowTo_CSS
 
HowTo_CSS
HowTo_CSSHowTo_CSS
HowTo_CSS
 
BloggingWithStyle_2008
BloggingWithStyle_2008BloggingWithStyle_2008
BloggingWithStyle_2008
 
BloggingWithStyle_2008
BloggingWithStyle_2008BloggingWithStyle_2008
BloggingWithStyle_2008
 
cascadingstylesheets
cascadingstylesheetscascadingstylesheets
cascadingstylesheets
 
cascadingstylesheets
cascadingstylesheetscascadingstylesheets
cascadingstylesheets
 

Último

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Último (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

selection_linkage_tutorial

  • 1. SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis Goal: This tutorial introduces several websites and tools useful for determining linkage disequilibrium for your gene or region of interest and tagSNP selection. In this section, you will cover the following topics. • SeattleSNPs website tools o Visual Genotype (VG2) o Visual Haplotype (VH1) • TagSNP selection tools o LDSelect o HaploBlockFinder o Haploview Part 1. Using Visual Genotype (VG2) in SeattleSNPs 1. Go to http://pga.gs.washington.edu 2. Find VG2 under “Software” (left-hand grey -colored panel of website). 3. Choose a population (European-descent) using the pull-down menu for “PGA Finished Gene PopulationFilter.” 4. Choose the gene MCP re-sequenced by SeattleSNPs using the pull-down menu for “PGA Finished Gene Prettybase Input File.” 5. Enter “5” in “Rare Allele Percentage (integer, 0 to 50).” This filter allows you to display only common SNPs (5% minor allele frequency) for MCP. Note that the list of SNPs at a specific MAF will depend on the population you choose in step #3. 6. Click on “Run VG2 on the Web!.” 7. This will return an image of the genotypes for MCP in the European-American sample in a pop-up window. The numbers at the top of the image represent the SNPs (numbered along a reference sequence used in re-sequencing the gene). The numbers on the left side of the image represent the sample ID. Each square represents an individual sample’s genotype: homozygous for the common allele (blue), heterozygous (red), and homozygous for the rare allele (yellow). 8. To save the image to your computer, right-click on the image and choose “save as.” Page 1 of 12
  • 2. Part 2. Linkage Disequilibrium (LD) Using VG2 in SeattleSNPs 1. With steps 1 through 5 completed from above. 2. Now Choose the LD statistic (r2) using the pull-down menu “Linkage Disequilibrium Plot.” 3. Choose the color of the LD plot (rainbow) using the pull-down menu “LD Shader Mode”. 4. Click on “Run VG2 on the Web!.” 5. You should have an image of the genotypes and an LD plot appearing in a pop-up window. 6. To save the image to your computer, right-click on the image and choose “save as.” 7. The defaults for LD min and max are 0.5 to 1 but you can change this parameter to 0 to 1. Try this option and then again run the default option. Page 2 of 12
  • 3. Part 3. Clustering and TagSNP Selection (LDSelect) Using VG2 in SeattleSNPs 1. With the completed steps 1 through 5 listed above in “Using Visual Genotype (VG2).” (You can leave out the LD plots from Part 2 if you like). 2. To sort the SNPs into clusters of related sites, go to the pull-down menu “Cluster and/or Draw Trees For” and choose “SITE.” If you left the setting from Part 2 you will receive both a visual genotype and an LD plot clustered by SNP relatedness. 3. To pick tagSNPs using LDSelect, go to the pull-down menu “Cluster and/or Draw Trees For” and choose “LDSelect” (The default for this is r2 > 0.64. A more interactive software package is under development). You will now have a visual genotype with SNPs clustered into bins. Each bin is denoted by a line over the SNPs included in that bin. A “*” over the SNP indicates that the SNP is a “tagSNP.” Only one tagSNP per bin is required to represent the genetic diversity of that bin (NOTE: This is different than other tagSNP algorithms based on haplotypes). If a tagSNP is not in a bin, it must be genotyped directly because no other SNP will serve as a sufficient proxy. Questions: How many bins are in MCP for the European-American population at MAF>5%? How many tagSNPs must be genotyped directly because they are not contained within a bin with another SNP? How many tagSNPs must be genotyped in a European-American population? Page 3 of 12
  • 5. Part 4. Using Visual Haplotype (VH1) for Haplotype tagSNP Selection in SeattleSNPs 1. Go to http://pga.gs.washington.edu 2. Find VH1 under “Software” (left-hand grey-colored panel of website). This software has a interface similar to VG2 but will display haplotypes. Haplotypes represent the alleles of each SNP assigned to an individuals chromosomes. Each individual has two chromosomes representing the maternal and paternal chromosomes inherited from his or her parents. The visual haplotype will be twice as long as the visual genotype because now each individual is represented by two rows of data (haplotypes) instead of just one row of data (genotypes). NOTE: Be aware that a proportion of the genes re-sequenced by SeattleSNPs are X-linked. In this situation, males have one X chromosome and females have two X chromosomes. 3. Choose a population (European-descent) using the pull-down menu for “PGA Finished Gene PopulationFilter.” 4. Choose the gene CRP re-sequenced by SeattleSNPs using the pull-down menu for “PGA Finished Gene Prettybase Input File.” 5. To pick tagSNPs to represent common genetic variation, we suggest you filter by minor allele frequency for common SNPs. Enter 10 in “Rare Allele Percentage Page 5 of 12
  • 6. (integer, 0 to 50).” Note that the list of SNPs at a specific MAF will depend on the population you choose in step #3. 6. Under haplotype sorting, choose “haplotype by frequency.” 7. To identify SNPs in haplotypes that are correlated (or contained in a “block”), sort by site. At ‘”Cluster By:” choose “SITE.” 8. Click on “Run VH1 on the Web!” 9. You should have an image of the haplotypes in a pop-up window. The numbers at the top of the image represent the SNPs (numbered along a reference sequence used in re-sequencing the gene). The SNPs here are sorted according to site relatedness. The numbers on the side of the image represent the sample ID. Each square represents an individual sample’s allele: common (blue) and rare (yellow) allele. Each row represents the individual sample’s haplotype, and each individual will have two rows representing the two chromosomes. You can identify “blocks” manually using VH1. Questions: How many haplotypes do you have? How many tagSNPs would you genotype? Part 5. Where to Find Haplotypes in SeattleSNPs 1. In addition to VH1, we offer PHASEv2.0 output for each of our genes re- sequenced on the SeattleSNPs website. On the home page, click on “Genes Sequenced for SNPs” (left side). 2. Choose MCP. 3. PHASE output is found in the “Haplotyping Data” section of the gene’s web page. We also offer a static image of the haplotypes in this section. To manipulate this image, use VH1 under software and create a new haplotype image for your gene of interest. Page 6 of 12
  • 7. Part 6. Downloading Genotype Data from HapMap 1. Go to http://www.hapmap.org 2. Click on “Generic Genome Browser” on left side of website. 3. In “Landmark or Region” field, type “membrane cofactor protein” for MCP. 4. Click on the first of the five entries for MCP (4179). You should see the gene MCP with 15 genotyped SNPs (denoted by little pie charts symbolizing the allele frequency for each population sample genotyped). 5. Scroll down to the bottom of the web page and look for “Dumps, Searches and other Operations.” Choose “Dump SNP genotype data” from the pull-down menu. 6. Click on “Configure.” Choose a population (CEU is CEPH or European-descent). The click on “Save to Disk.” Alternatively, you can click on “Open directly in HaploView” if you have Haploview loaded on your computer. Click “Go.” Part 7. Using HapMap Data in Haploview 1. Download and install Haploview 3.2 from http://www.broad.mit.edu/mpg/haploview/index.php 2. Open Haploview. Click on “Load HapMap data.” Load the file you saved in the previous section (Downloading Genotype Data from HapMap). If you are connected to the internet, click on “Download and show HapMap info track?” “Click “OK.” Alternatively, if you did not save the file from the previous section, you can download the file “mcp_hapmap.txt” from http://pga.gs.washington.edu/wustl/data_files/datafiles.html Load this file onto Haploview and click “OK.” 3. The first view of the data is the “check markers” window. This provides a nice summary of the marker data, including name of the markers, genomic position of the markers, observed heterozygosity, predicted heterozygosity, Hardy Weinberg, % samples successfully genotyped, the number of fully genotyped family trios for each marker, the number of Mendelian inheritance errors, minor allele frequency, and pass/fail quality control for each marker. Two markers fail (denoted in red) in the MCP dataset because they are monomorphic in the samples genotyped (no heterozygotes or homozygotes for the rare allele). 4. Haploview offers a visual of the LD statistic. Click on the “LD” tab. You can change haplotype block definitions by going to “Analysis” and select the block definition. The default is the block definition by Gabriel et al in Science (2002). To change the LD statistic, click on “Display” and select the statistic of your choice. For this example, choose “four-gamete rule”. Questions: How many blocks are in MCP for the European-descent population using the default block definition in Haploview? How does changing the definition change the block structure in MCP? 5. By default, if the LD statistic is 1.0 for a particular marker pair, the number 1.0 is not shown in the figure. Any LD statistic less than 1.0 is shown in the figure. Right-click on the “95” square. This pop-up will give you statistics related to the pair of markers used in calculating this LD statistic. Page 7 of 12
  • 8. Questions: How far apart are these two markers physically? 6. For haplotypes, click on the “Haplotypes” tab. Haplotype frequencies are displayed on the right side of each haplotype. The triangles above the haplotypes denote the haplotype tagging SNPs. In cases of complex haplotypes (not shown here for the HapMap data for MCP), there will be lines connecting haplotypes, denoting their relationship to one another. Also, there will be a multiallelic D′ value. Questions: How many haplotypes were identified in this dataset? How many haplotype tagging SNPs were identified? 7. For the minimal set of tagSNPs, go to the “Tagger” tab. You can choose the algorithm used to define tagSNPs. For this example, choose “pairwise tagging only”. Then click “Run Tagger.” The results are displayed so that the tagSNPs are on the left of the screen. The right side of the screen shows which SNPs are being tagged by other SNPs. Questions: Using “Tagger,” how many Haploview tagSNPs are in MCP for the European-descent HapMap data? Part 8. Using SeattleSNPs Data in Haploview 1. When loading the SeattleSNPs genotypes for MCP for this exercise, click on “load phased genotypes.” 2. Download “mcpxx.haploview_input.txt” and “mcpxx_locus_info.txt” from http://pga.gs.washington.edu/wustl/data_files/datafiles.html. The input file here is Page 8 of 12
  • 9. MCP haplotype data (using PHASEv2.1) for European-Americans. Load this file onto Haploview and click “OK.” Repeat steps 4, 5, 6, and 7 from “Using HapMap Data in Haploview.” Note the difference between complete variation data and sampled variation data. Questions: How many tagSNPs are identified in complete variation data for MCP? Part 9. HaploBlockFinder 1. HaploBlockFinder accepts several formats, including PHASE. Download a PHASE output file from http://pga.gs.washington.edu/wustl/data_files/datafiles.html called “mcpxx.ED.phase.out”. Also, download the marker file “mcpxx_locus_info.txt”. 2. Go to http://cgi.uc.edu/cgi-bin/kzhang/haploBlockFinder.cgi 3. Enter the path of the PHASE file in the “Load haplotype file” field. Also, enter the path of the marker file in the “Load locus information file” field. 4. Choose the block definition from the pull-down menu. For this example, use the “four gamete test.” 5. At “Minor allele frequency lower bound,” specify a MAF 5% (0.05 in the field). For many genes, the program will not run on the website with all SNPs. To use all SNPs, you will have to download the program onto your computer and run the program locally. 6. Choose “Yes” for “Show LD matrix” and “Get tagging SNPs.” 7. Click on “Go.” 8. It may take a while to receive results, which is most likely due to the tagging SNP option. The results are returned zipped, so they must be saved and unzipped before you can open them. You should have several text files, which can be opened in Excel, and two .png files. The .png files are the LD matrix output and the visualization of the block structure for your gene of interest. The text files list the blocks and tagSNPs required to represent the blocks. Note that all the identified tagSNPs per block are required in an association study (unlike LDSelect where only one tagSNP per bin is required). Questions: How many blocks did HaploBlockFinder identify? How many tagSNPs did HaploBlockFinder identify? Page 9 of 12
  • 10. Part 10. TagSNP Selection Using LDSelect in Perl 1. You need an operating system with Perl 5.0 or above installed. For Windows users, you probably need to download Perl from www.perl.com For Mac OS X users, you can use Perl when you open a new terminal window. 2. Download the LDSelect Perl code from http://droog.gs.washington.edu/ldSelect.html by right clicking (Windows/Linux) or control-clicking (OSX) on the Download Link "ldSelect.pl" and choosing “Download link to disk”. For Mac OS X users, you will then have to change the permission of the file to an executable file. To do so, open a Unix command prompt (e.g. using the terminal or X11 program), navigate to the directory where you downloaded the script, and type “chmod –x ldselect.pl”. 3. You also need genotype data in a "prettybase" format. For this workshop, we have data for the gene MCP for European-Americans available in several formats. Go to http://pga.gs.washington.edu/wustl/data_files/datafiles.html and download "mcpxx.prettybase.txt.ED.txt" 4. At the command line, type "perl ldselect.pl -pb mcpxx.prettybase.txt.ED.txt" and enter. 5. If you want to save the output, type "perl ldselect.pl -pb mcpxx.prettybase.txt.ED.txt > output.txt". To view the output, type "more output.txt". Page 10 of 12
  • 11. 6. If you want to know where the tagSNP is in relation to the gene, you need a SNP context file. Download "mcpxx.context.txt" from http://pga.gs.washington.edu/wustl/data_files/datafiles.html. Then, type "perl ldselect.pl -pb mcpxx.prettybase.txt.ED.txt -context mcpxx.context.txt > output.txt". The output file will now have the tagSNPs and other SNPs labeled with two letters. The first letter indicates if the SNP is in a unique region (U) or repeat region (R). The second letter indicates if the SNP is nonsynonymous (N), synonymous (S), intronic (I), UTR (T) or flanking (F). 7. If you want a specific minor allele frequency, use the -freq flag. For example, for MAF >5%, type "perl ldselect.pl -pb mcpxx.prettybase.txt.ED.txt -freq 0.05 > output.txt". The default r2 threshold is 0.64. To change it, use the -r2 flag. Increasing the r2 threshold will increase the number of tagSNPs identified through LDSelect depending on the linkage disequilibrium in the genomic region of interest. Answers to Questions: How many bins are in MCP for the European-American population at MAF>5%? 7 bins with >1 SNP; 5 “bins” with only one SNP. How many tagSNPs must be genotyped directly because they are not contained within a bin with another SNP? 5 How many tagSNPs must be genotyped in a European-American population? 12 How many haplotypes do you have? 4 How many tagSNPs would you genotype? 3- 4 How many blocks are in MCP for the European-descent population using the default block definition in Haploview? One How does changing the definition change the block structure in MCP? Changing the definition causes the block boundary to change in MCP. How far apart are these two markers physically? 3.1kb How many haplotypes were identified in this dataset? 7 How many haplotype tagging SNPs were identified? 6 Using “Tagger,” how many Haploview tagSNPs are in MCP for the European-descent HapMap data? 6 How many tagSNPs are identified in complete variation data for MCP in a European- descent sample? 31 Page 11 of 12
  • 12. How many blocks did HaploBlockFinder identify? 9 How many tagSNPs did HaploBlockFinder identify? 21 Page 12 of 12