SlideShare uma empresa Scribd logo
1 de 5
Hands on Exercises – Day1
Sucheta Tripathy, VBI
Genetics of being Phytophthora?
• Objective: Find a coding sequence that is unique
in Phytophthora.
• What is starting material?
– 16 million RNASeq reads are assembled into P.sojae
reference sequence to generate junctions. These
junctions are judged using some of the best available
algorithms.
• http://vmd.vbi.vt.edu/download/data/workshop
2010/
– Coverage.wig
– ps1V1.fasta
Transcript discovery
• Sort the coverage file on the basis of the
number of hits to the reads on column 4.
• Find the upper 25% percentile.
• Remove sequences larger than 1000 or less
than 10 bases long.
• Fetch data from ps1V1 file.
• Split fasta file into N equal parts.
Annotation Steps
• Blast against P.sojae gene
models(vmd.vbi.vt.edu/toolkit).
• Check coding potential with P.sojae codon usage
tables.
- If found hit, then get the gene model and compare
the splice sites and correct it.
- If not found, then blast against
P.ramorum/H.arabidopsidis/P.infestans coding
sequences.
- See if matches with the splice junctions correctly – if
not, the gene models in those organisms are
INCORRECT.
Annotation
• Blast against nr database. If blast hit is not
found with any coding sequences in nr
database, then most probably you found a
new gene..
• Check if the sequence is a signal
peptide/target peptide to determine if it is
secretory in nature.
• Run MEME motif analysis search on the
sequence.

Mais conteúdo relacionado

Semelhante a Hands on exercise day1 (8)

wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualize
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009
 
Giab poster structural variants ashg 2018
Giab poster structural variants ashg 2018Giab poster structural variants ashg 2018
Giab poster structural variants ashg 2018
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
Frequent Itemset Mining on BigData
Frequent Itemset Mining on BigDataFrequent Itemset Mining on BigData
Frequent Itemset Mining on BigData
 
A guided tour of Araport
A guided tour of AraportA guided tour of Araport
A guided tour of Araport
 

Mais de Sucheta Tripathy

Mais de Sucheta Tripathy (20)

Gal
GalGal
Gal
 
Ramorum2016 final
Ramorum2016 finalRamorum2016 final
Ramorum2016 final
 
Primer designgeneprediction
Primer designgenepredictionPrimer designgeneprediction
Primer designgeneprediction
 
Motif andpatterndatabase
Motif andpatterndatabaseMotif andpatterndatabase
Motif andpatterndatabase
 
Databases ii
Databases iiDatabases ii
Databases ii
 
Snps and microarray
Snps and microarraySnps and microarray
Snps and microarray
 
Stat2013
Stat2013Stat2013
Stat2013
 
26 nov2013seminar
26 nov2013seminar26 nov2013seminar
26 nov2013seminar
 
Stat2013
Stat2013Stat2013
Stat2013
 
Presentation2013
Presentation2013Presentation2013
Presentation2013
 
Lecture7,8
Lecture7,8Lecture7,8
Lecture7,8
 
Lecture5,6
Lecture5,6Lecture5,6
Lecture5,6
 
Primer designgeneprediction
Primer designgenepredictionPrimer designgeneprediction
Primer designgeneprediction
 
Lecture 3,4
Lecture 3,4Lecture 3,4
Lecture 3,4
 
Lecture 1,2
Lecture 1,2Lecture 1,2
Lecture 1,2
 
Sequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSASequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSA
 
Databases Part II
Databases Part IIDatabases Part II
Databases Part II
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Genome sequencingprojects
Genome sequencingprojectsGenome sequencingprojects
Genome sequencingprojects
 
Human encodeproject
Human encodeprojectHuman encodeproject
Human encodeproject
 

Hands on exercise day1

  • 1. Hands on Exercises – Day1 Sucheta Tripathy, VBI
  • 2. Genetics of being Phytophthora? • Objective: Find a coding sequence that is unique in Phytophthora. • What is starting material? – 16 million RNASeq reads are assembled into P.sojae reference sequence to generate junctions. These junctions are judged using some of the best available algorithms. • http://vmd.vbi.vt.edu/download/data/workshop 2010/ – Coverage.wig – ps1V1.fasta
  • 3. Transcript discovery • Sort the coverage file on the basis of the number of hits to the reads on column 4. • Find the upper 25% percentile. • Remove sequences larger than 1000 or less than 10 bases long. • Fetch data from ps1V1 file. • Split fasta file into N equal parts.
  • 4. Annotation Steps • Blast against P.sojae gene models(vmd.vbi.vt.edu/toolkit). • Check coding potential with P.sojae codon usage tables. - If found hit, then get the gene model and compare the splice sites and correct it. - If not found, then blast against P.ramorum/H.arabidopsidis/P.infestans coding sequences. - See if matches with the splice junctions correctly – if not, the gene models in those organisms are INCORRECT.
  • 5. Annotation • Blast against nr database. If blast hit is not found with any coding sequences in nr database, then most probably you found a new gene.. • Check if the sequence is a signal peptide/target peptide to determine if it is secretory in nature. • Run MEME motif analysis search on the sequence.