SlideShare uma empresa Scribd logo
1 de 8
1 of 8
Bioinformatics
Bioinformatics are using of computers to store, organize and analyze biological information
(particularly that based on DNA and protein sequence).
A large number of different kinds of data have now been incorporated into biological
databases and can be said to be part of the domain of bioinformatics:
 DNA sequence, RNA sequence, Protein sequence
 Data from the analysis of transcription across whole genomes (transcriptomics), and
analysis of the expressed protein co mplement of cells or tissues under particular
circumstances (proteomics).
 Data from other ‘omics experiments, including metabolomics, proteomics, and
methods for studying interactions between molecules on a large scale, such as two-
hybrid analysis.
 Structural data from X-ray crystallography and nuclear magnetic resonance (NMR)
 Common Sequence Analyses
 compiling and editing sequences
 manipulating sequences
 complement, reverse, or both
 translating
 structural analysis
 searching for coding regions
 physical properties
 comparisons and homology searches
 aligning sequences
 sites and motifs
 gene identification
Basic information
o DNA contains the genetic information that encodes traits.
o Complementary: Complementary bases are two bases that are able to pair with each
other and create a base pair. In DNA, adenine (A) and thymine (T) are
complementary, and the A on one DNA strand interacts with the T on the opposite (or
complementary) DNA strand to form a bond to complete double stranded DNA.
Similarly, guanine (G) and cytosine (C) are complementary because they always
interact with one another. This complementary interaction happens between each
base on the DNA strands.
o DNA is double stranded and antiparallel. DNA strands are “anti-parallel” to one
another, meaning they are parallel (or side-by-side) but run in opposite directions, with
the beginning (or 5’ region) of one DNA strand found at the same location as the end
(or 3’ region) of the opposite DNA strand.
o 5’ and 3’: The 5’ region is the beginning of the DNA or RNA strand, while the 3’ region
is the end of the DNA or RNA strand. Scientists often say that DNA is “read 5’ to 3’”
which means that the sequence is read or examined from the beginning (5’) to the
end (3’), left to right. Genes are also transcribed and translated in a 5’ to 3’ direction.
2 of 8
o Read the sequence of DNA from 5’to 3’
o Read the sequence of mRNA from 5’to 3’
o Read the sequence of polypeptide from N-terminal to C-terminal
o Gene: The unit of heredity. A segment of DNA that codes for a specific protein.
o Coding strand or nontemplate strand or sense strand: Of the two DNA strands, the
coding strand is the strand that has the same sense as the messenger RNA. We can
use the DNA sequence from the coding strand to predict a protein sequence by
looking at the genetic code. This strand is the same sequence as the mRNA. Only
one strand is used for one gene, but different strands can encode different genes.
o Non-coding or template strand or Anti-sense strand: Of the two DNA strands, the anti-
sense strand is the one that is complementary to the strand that encodes a gene. This
strand is used to make the mRNA.
o Messenger RNA (mRNA): The template for protein synthesis. mRNA is created from a
gene through the process of transcription, and is used in the process of translation to
make proteins.
o Transcription: The process by which genetic information is copied from DNA into
mRNA.
o Transfer RNA (tRNA): small molecules of RNA that transfer specific amino acids to a
growing protein chain during protein synthesis (translation), using the information
encoded by the mRNA.
o Nucleotide triplets: Series of three nucleotides in a row, usually called a “codon,” that
specifies the genetic code information for a particular amino acid when translating a
gene into protein. For example, the nucleotide triplet or codon CCG codes for the
amino acid proline (P).
o Start codon: A three-nucleotide triplet or codon that tells the cell when to start protein
translation. The start codon is ATG (or AUG in mRNA).
o Stop codons: A three-nucleotide triplet or codon that tells the cell when to stop protein
translation. The stop codons are TAA, TAG, and TGA (or UAA, UAG, and UGA in
mRNA).
o Protein: A type of molecule found in cells formed from a chain of amino acids encoded
by genes. Proteins perform many of the functions needed in the cell.
o Translation: The process by which mRNA is decoded to produce a protein.
o Non-conservative: When amino acid changes observed between two or more
sequences do not share the same side chain or R-group chemistry.
o Conservative: When amino acid changes observed between two or more sequences
do share the same side chain or R-group chemistry.
o Because of the redundancy of the genetic code, in which multiple codons may encode
the same amino acid, the DNA and corresponding protein sequence may appear to
vary at different rates.]
o mRNA transcripts contain “start” and “stop” codons that initiate and terminate protein
translation, respectively.
o DNA encodes proteins through the processes of transcription and translation. Amino
acids are encoded by three nucleotide triplets called codons. The concepts of “start”
3 of 8
and “stop” codons. Proficiency in the use of the genetic code table, including the use
of single letter amino acid abbreviations, is helpful.
o Reading frame: A reading frame is a contiguous and non-overlapping sequence of
three-nucleotide codons in DNA or RNA. There are three possible reading frames in
an mRNA strand and six in a double-stranded DNA molecule (three reading frames
from each of the two DNA strands).
o Open reading frame: A reading frame that contains a start codon and a stop codon,
with multiple three-nucleotide codons in between. The open reading frame in a
particular region of DNA is the correct reading frame from which to translate the DNA
into protein. The longest open reading frame is the one that’s most likely to be correct.
o DNA sequences can be read in any of six possible reading frames, but only one of
these is usually translated by the cell into a protein. This is called the open reading
frame.
o Genetic analyses and multiple sequence alignments can be generated with either
DNA or protein sequences.
o Common Misconceptions: Translation always starts with the 1st letter of a DNA
sequence, or with the first start codon (AUG/ATG). All DNA codes for protein. Genes
are found only on one of the strands of DNA.
o In silico: An expression used to mean “performed on computer or via computer
simulation.”
Sequence alignment
Alignments provide a powerful way to compare sequences for either evolutionary
relatedness or structural/functional relatedness. Sequences can be compared by either
global or local alignments. Global alignment forces complete alignment of the input
sequences, whereas local alignments will only align their most similar segments. The choice
of method will depend on whether the sequences are presumed to be related over their entire
lengths or to only share isolated regions of homology. The algorithms to carry out global and
local alignments are similar, but the statistics associated with the output are different.
Homologs are structures or objects that share a common evolutionary origin. Objects with
similar structure or function, but no common ancestor are analogs. Homologs can be
classified as orthologs or paralogs. Orthologs are homologous genes from different species
that arose from a common ancestor gene. Paralogs are homologous genes that are the result
of gene duplication within an evolutionary lineage (eg., species) and may have different or
similar functions.
Searching databases
Comparing a sequence against a database to discover similarities is one of the most
frequently used and powerful tools in bioinformatics. In essence, searching a database is the
same as aligning two sequences. However, to perform a pair-wise alignment of the query
sequence with every sequence in the database would be too time consuming. Therefore
programs use heuristic strategies to speed up the analysis are usually used.
FASTA and BLAST are the two most commonly used programs for searching
databases. Both programs use rapid exact match procedures to first identify sequences
which are most likely to be related. These sequences are then subjected to further analyses
4 of 8
by alignment programs. BLAST (basic local alignment search tool) uses an approach based
on matching short sequence fragments and then finds the best local alignments between the
query sequence and the database sequences. Several BLAST programs are available for
different types of sequence matching
BLAST searches of the most current version of GenBank can be performed at the
National Center for Biotechnology Information (NCBI) by cutting and pasting the query
sequence onto their web page (http://www.ncbi.nlm.nih.gov/ BLAST/). The entire database or
subdivisions of the database can be searched. It is also desirable to 'filter' low complexity
sequences such as long runs of repetitive sequence (eg., tracts of poly-alanine, etc.) and
most BLAST programs filter by default. Such regions of sequence can give spuriously high
scores against unrelated proteins. Similarly, iterative searches are prone to contamination by
regions corresponding to coiled-coils or transmembrane domains. These characteristics
might match the initial search and the program then emphasizes these inappropriate
characteristics. The user can also choose cut-off values (i.e., E-values), the number of
matches to report, and the number of alignments to show.
The entire database(s) is searched and sequences with similarity to the query
sequence are identified and ranked accordingly to the alignment score derived from the pair-
wise alignment. In addition to this 'raw or bit score', the output includes a 'statistical score'
and the alignments. The statistical score, or expectation value (E-value), provides a measure
of the expected number of sequences in the database that would achieve a given alignment
score by chance. The E-values are more useful in terms of judging the significance of a
match than the alignment score. E-values for proteins and DNA should be less than 0.1 and
0.0001, respectively, to be considered significant. Examining the alignments will also help in
inferring whether the hit is significant. For example, protein global alignments which show
25% or greater sequence identity over a stretch of at least 80 amino acids should exhibit the
same basic structure. The identified sequences can be also retrieved and subjected to further
analyses. Database searches may assist in identifying an unknown gene or provide clues
about its function.
The normal approach is to compare the unknown sequence with a database of
previously determined sequences, to identify the most closely related sequence or
sequences from the database. This method relies on the fact that sequences of DNA and
proteins both within and between species are related by homology; that is through having
common evolutionary ancestors, and so proteins (and RNA and DNA motifs) tend to occur in
families with related sequences, structures and functions. Two DNA or two protein
5 of 8
sequences will always show some similarity, if only by chance. The principle of searching for
meaningful sequence similarity is to align two sequences, that is, array them one above the
other with instances of the same base or amino acid at the same position.
Multiple Sequence Alignments
multiple sequence alignments involve the alignment of more than two sequences.
The advantage of aligning multiple DNA or protein sequences is that evolutionary
relationships between individual genes or sequences become clearer, and it becomes
possible to identify important residues or motifs from their conservation between genes or
proteins from different organisms (orthologs), or proteins of similar, but not identical function
(homologs, paralogs, . The most commonly used multiple sequence alignment tool is called
Clustal, and, as with the other bioinformatics tools, is available from many centers as a web-
based application. The method works by initially carrying out pairwise alignments between all
the sequences, as in the BLAST method described above, but then uses this information to
gradually combine the sequences, beginning with most similar, and finishing with the most
distant, optimizing the overall match of the sequences as it goes.
Phylogenetic Trees
The usefulness of sequence similarity searching is based on the fact that similar sequences
are related through evolutionary descent. Hence, we can use the similarity between
sequences to infer something about these evolutionary relationships. Since individual species
have separated from each other at different times during evolution, their DNA and protein
sequences should have diverged from each other and show differences that are broadly
proportional to the time since divergence, in other words the sequences should form a
molecular clock. The amount of sequence similarity can therefore be used to generate
phylogenetic relationships, and phylogenetic trees .
Structural Bioinformatics
The mismatch between the number of known protein sequences and the number of
determined three-dimensional structures has led to the development of methods to derive
structure direct from sequence. These include secondary structure prediction and
comparative or homology modeling, which uses sequence similarity between an unknown
and a known structure to derive a plausible structure for the new protein.
Proteomics
The gene sequences provide a limited amount of information about the proteins that
are encoded by the genes. In addition, phenotype will be determined by the proteins and not
the genes (i.e., genotype). Efforts are now being directed towards the characterization of the
proteome (the complete set of proteins found in a cell, tissue or organism). Proteomics
encompasses many types of activities and analyses with a goal of having a complete
understanding of protein function and to apply this knowledge. One objective of proteomics is
to identify the complete complement of proteins. Genomics provides a first step in this
process since all of the genes within a genome can potentially be identified.
6 of 8
Coding or non-template DNA strand 5’ ATG CAT AGC ATG AGG ATC AT 3’
Complement 3’ TAC GTA TCG TAC TCC TAG TA5’
mRNA 5’ AUG CAU AGC AUG AGG AUC AU 3’
Protein N- M H S M R I -C
Reverse 5’ TAC TAG GAG TAC GAT ACG TA 3’
Reverse complement 5’ ATG ATC CTC CTG CTA TGC AT 3’
Composition A=7 : T=5: C=3: G=5
AT =60%: GC=40%:length 20
7 of 8
8 of 8

Mais conteúdo relacionado

Mais procurados

Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityMonica Munoz-Torres
 
Genome Sequencing - Ahmadrezarafati 1395-01-30
Genome Sequencing - Ahmadrezarafati 1395-01-30Genome Sequencing - Ahmadrezarafati 1395-01-30
Genome Sequencing - Ahmadrezarafati 1395-01-30Ahmadreza Rafati Roudsari
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseqDenis C. Bauer
 
Plasmids and bacteriophages
Plasmids and bacteriophagesPlasmids and bacteriophages
Plasmids and bacteriophagesdrsoniabansal
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsAyeshaYousaf20
 
sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformaticsnadeem akhter
 
20100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture0820100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture08Computer Science Club
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_PresentationToyin23
 
Rna lecture
Rna lectureRna lecture
Rna lecturenishulpu
 
Gene expression introduction
Gene expression introductionGene expression introduction
Gene expression introductionSetia Pramana
 
subtractive hybridization
subtractive hybridizationsubtractive hybridization
subtractive hybridizationSakshi Saxena
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databasesPranavathiyani G
 
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCINGDNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCINGPuneet Kulyana
 
Prediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein featuresPrediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein featuresLars Juhl Jensen
 

Mais procurados (20)

Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
 
Genome Sequencing - Ahmadrezarafati 1395-01-30
Genome Sequencing - Ahmadrezarafati 1395-01-30Genome Sequencing - Ahmadrezarafati 1395-01-30
Genome Sequencing - Ahmadrezarafati 1395-01-30
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseq
 
Plasmids and bacteriophages
Plasmids and bacteriophagesPlasmids and bacteriophages
Plasmids and bacteriophages
 
Genome assembly
Genome assemblyGenome assembly
Genome assembly
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
Apolo Taller en BIOS
Apolo Taller en BIOS Apolo Taller en BIOS
Apolo Taller en BIOS
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 
sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformatics
 
20100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture0820100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture08
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
 
31931 31941
31931 3194131931 31941
31931 31941
 
Rna lecture
Rna lectureRna lecture
Rna lecture
 
Gene expression introduction
Gene expression introductionGene expression introduction
Gene expression introduction
 
subtractive hybridization
subtractive hybridizationsubtractive hybridization
subtractive hybridization
 
3 dna libraries
3 dna libraries3 dna libraries
3 dna libraries
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCINGDNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
 
Prediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein featuresPrediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein features
 
Gen bank (genetic sequence databank)
Gen bank (genetic sequence databank)Gen bank (genetic sequence databank)
Gen bank (genetic sequence databank)
 

Semelhante a Bioinformatics Guide

Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema
 
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptxBTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptxChijiokeNsofor
 
Analysis of Genomic and Proteomic Sequence Using Fir Filter
Analysis of Genomic and Proteomic Sequence Using Fir FilterAnalysis of Genomic and Proteomic Sequence Using Fir Filter
Analysis of Genomic and Proteomic Sequence Using Fir FilterIJMER
 
LECTURE 7.pptx
LECTURE 7.pptxLECTURE 7.pptx
LECTURE 7.pptxericndunek
 
L-1_Nucleic acid.pptx
L-1_Nucleic acid.pptxL-1_Nucleic acid.pptx
L-1_Nucleic acid.pptxMithilaBanik
 
Processing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing DataProcessing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing DataAlireza Doustmohammadi
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical NotebookNaima Tahsin
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seqJyoti Singh
 
Genome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTKGenome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTKMonica Munoz-Torres
 
Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Monica Munoz-Torres
 
Forensic dna typing by John M Butler
Forensic dna typing by John M ButlerForensic dna typing by John M Butler
Forensic dna typing by John M ButlerMuhammad Ahmad
 
Genome organization ,gene expression sand regulation
Genome organization ,gene expression sand regulation Genome organization ,gene expression sand regulation
Genome organization ,gene expression sand regulation sukanyakk
 
Introduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinisIntroduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinisMonica Munoz-Torres
 

Semelhante a Bioinformatics Guide (20)

Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
 
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptxBTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
 
Analysis of Genomic and Proteomic Sequence Using Fir Filter
Analysis of Genomic and Proteomic Sequence Using Fir FilterAnalysis of Genomic and Proteomic Sequence Using Fir Filter
Analysis of Genomic and Proteomic Sequence Using Fir Filter
 
LECTURE 7.pptx
LECTURE 7.pptxLECTURE 7.pptx
LECTURE 7.pptx
 
L-1_Nucleic acid.pptx
L-1_Nucleic acid.pptxL-1_Nucleic acid.pptx
L-1_Nucleic acid.pptx
 
Gene identification using bioinformatic tools.pptx
Gene identification using bioinformatic tools.pptxGene identification using bioinformatic tools.pptx
Gene identification using bioinformatic tools.pptx
 
Processing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing DataProcessing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing Data
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical Notebook
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
Finding genes
Finding genesFinding genes
Finding genes
 
genomeannotation-160822182432.pdf
genomeannotation-160822182432.pdfgenomeannotation-160822182432.pdf
genomeannotation-160822182432.pdf
 
Genome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTKGenome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTK
 
Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07
 
Est database
Est databaseEst database
Est database
 
Forensic dna typing by John M Butler
Forensic dna typing by John M ButlerForensic dna typing by John M Butler
Forensic dna typing by John M Butler
 
Genome organization ,gene expression sand regulation
Genome organization ,gene expression sand regulation Genome organization ,gene expression sand regulation
Genome organization ,gene expression sand regulation
 
Gene expression profiling
Gene expression profilingGene expression profiling
Gene expression profiling
 
Comparitive genomics
Comparitive genomicsComparitive genomics
Comparitive genomics
 
Introduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinisIntroduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinis
 

Mais de Nawfal Aldujaily (20)

Global one health and global threats
Global one health and global threats  Global one health and global threats
Global one health and global threats
 
Biggest Threats and Global health security 2021
Biggest Threats and Global health security  2021Biggest Threats and Global health security  2021
Biggest Threats and Global health security 2021
 
Global health security 2020
Global health security  2020 Global health security  2020
Global health security 2020
 
Syllabus of genetic
Syllabus of genetic Syllabus of genetic
Syllabus of genetic
 
Pedigree analysis
Pedigree analysisPedigree analysis
Pedigree analysis
 
linkage
linkagelinkage
linkage
 
chromosome structure and function
chromosome structure and functionchromosome structure and function
chromosome structure and function
 
Mendelian genetics
Mendelian geneticsMendelian genetics
Mendelian genetics
 
Interactions among organelles
Interactions among organellesInteractions among organelles
Interactions among organelles
 
Cytoplasm 2
Cytoplasm 2Cytoplasm 2
Cytoplasm 2
 
Cytoplasm
CytoplasmCytoplasm
Cytoplasm
 
Cell cycle
Cell cycle Cell cycle
Cell cycle
 
Nucleus
NucleusNucleus
Nucleus
 
cell cell communication
cell cell communicationcell cell communication
cell cell communication
 
plasma membrane
plasma membrane plasma membrane
plasma membrane
 
Cell Chemistry
Cell Chemistry Cell Chemistry
Cell Chemistry
 
Antimicrobial resistance and Biofilm
Antimicrobial resistance and Biofilm Antimicrobial resistance and Biofilm
Antimicrobial resistance and Biofilm
 
Evolution
Evolution Evolution
Evolution
 
Nanobiotechnology
NanobiotechnologyNanobiotechnology
Nanobiotechnology
 
Genomics
GenomicsGenomics
Genomics
 

Último

Chandrapur Call girls 8617370543 Provides all area service COD available
Chandrapur Call girls 8617370543 Provides all area service COD availableChandrapur Call girls 8617370543 Provides all area service COD available
Chandrapur Call girls 8617370543 Provides all area service COD availableDipal Arora
 
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...narwatsonia7
 
High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...
High Profile Call Girls Coimbatore Saanvi☎️  8250192130 Independent Escort Se...High Profile Call Girls Coimbatore Saanvi☎️  8250192130 Independent Escort Se...
High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...narwatsonia7
 
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Darjeeling Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Darjeeling Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Darjeeling Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Darjeeling Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...hotbabesbook
 
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...astropune
 
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...narwatsonia7
 
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...jageshsingh5554
 
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...Arohi Goyal
 
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiRussian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiAlinaDevecerski
 
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...astropune
 
Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual Needs
Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual NeedsBangalore Call Girl Whatsapp Number 100% Complete Your Sexual Needs
Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual NeedsGfnyt
 
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...
Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...
Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...narwatsonia7
 
Low Rate Call Girls Kochi Anika 8250192130 Independent Escort Service Kochi
Low Rate Call Girls Kochi Anika 8250192130 Independent Escort Service KochiLow Rate Call Girls Kochi Anika 8250192130 Independent Escort Service Kochi
Low Rate Call Girls Kochi Anika 8250192130 Independent Escort Service KochiSuhani Kapoor
 
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...Garima Khatri
 
Call Girls Colaba Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls Colaba Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls Colaba Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls Colaba Mumbai ❤️ 9920874524 👈 Cash on Deliverynehamumbai
 

Último (20)

Chandrapur Call girls 8617370543 Provides all area service COD available
Chandrapur Call girls 8617370543 Provides all area service COD availableChandrapur Call girls 8617370543 Provides all area service COD available
Chandrapur Call girls 8617370543 Provides all area service COD available
 
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
 
High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...
High Profile Call Girls Coimbatore Saanvi☎️  8250192130 Independent Escort Se...High Profile Call Girls Coimbatore Saanvi☎️  8250192130 Independent Escort Se...
High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...
 
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Darjeeling Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Darjeeling Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Darjeeling Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Darjeeling Just Call 9907093804 Top Class Call Girl Service Available
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
 
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
 
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...
 
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
 
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
 
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiRussian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
 
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
 
Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual Needs
Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual NeedsBangalore Call Girl Whatsapp Number 100% Complete Your Sexual Needs
Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual Needs
 
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
 
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCREscort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
 
Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...
Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...
Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...
 
Low Rate Call Girls Kochi Anika 8250192130 Independent Escort Service Kochi
Low Rate Call Girls Kochi Anika 8250192130 Independent Escort Service KochiLow Rate Call Girls Kochi Anika 8250192130 Independent Escort Service Kochi
Low Rate Call Girls Kochi Anika 8250192130 Independent Escort Service Kochi
 
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
 
Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...
Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...
Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...
 
Call Girls Colaba Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls Colaba Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls Colaba Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls Colaba Mumbai ❤️ 9920874524 👈 Cash on Delivery
 

Bioinformatics Guide

  • 1. 1 of 8 Bioinformatics Bioinformatics are using of computers to store, organize and analyze biological information (particularly that based on DNA and protein sequence). A large number of different kinds of data have now been incorporated into biological databases and can be said to be part of the domain of bioinformatics:  DNA sequence, RNA sequence, Protein sequence  Data from the analysis of transcription across whole genomes (transcriptomics), and analysis of the expressed protein co mplement of cells or tissues under particular circumstances (proteomics).  Data from other ‘omics experiments, including metabolomics, proteomics, and methods for studying interactions between molecules on a large scale, such as two- hybrid analysis.  Structural data from X-ray crystallography and nuclear magnetic resonance (NMR)  Common Sequence Analyses  compiling and editing sequences  manipulating sequences  complement, reverse, or both  translating  structural analysis  searching for coding regions  physical properties  comparisons and homology searches  aligning sequences  sites and motifs  gene identification Basic information o DNA contains the genetic information that encodes traits. o Complementary: Complementary bases are two bases that are able to pair with each other and create a base pair. In DNA, adenine (A) and thymine (T) are complementary, and the A on one DNA strand interacts with the T on the opposite (or complementary) DNA strand to form a bond to complete double stranded DNA. Similarly, guanine (G) and cytosine (C) are complementary because they always interact with one another. This complementary interaction happens between each base on the DNA strands. o DNA is double stranded and antiparallel. DNA strands are “anti-parallel” to one another, meaning they are parallel (or side-by-side) but run in opposite directions, with the beginning (or 5’ region) of one DNA strand found at the same location as the end (or 3’ region) of the opposite DNA strand. o 5’ and 3’: The 5’ region is the beginning of the DNA or RNA strand, while the 3’ region is the end of the DNA or RNA strand. Scientists often say that DNA is “read 5’ to 3’” which means that the sequence is read or examined from the beginning (5’) to the end (3’), left to right. Genes are also transcribed and translated in a 5’ to 3’ direction.
  • 2. 2 of 8 o Read the sequence of DNA from 5’to 3’ o Read the sequence of mRNA from 5’to 3’ o Read the sequence of polypeptide from N-terminal to C-terminal o Gene: The unit of heredity. A segment of DNA that codes for a specific protein. o Coding strand or nontemplate strand or sense strand: Of the two DNA strands, the coding strand is the strand that has the same sense as the messenger RNA. We can use the DNA sequence from the coding strand to predict a protein sequence by looking at the genetic code. This strand is the same sequence as the mRNA. Only one strand is used for one gene, but different strands can encode different genes. o Non-coding or template strand or Anti-sense strand: Of the two DNA strands, the anti- sense strand is the one that is complementary to the strand that encodes a gene. This strand is used to make the mRNA. o Messenger RNA (mRNA): The template for protein synthesis. mRNA is created from a gene through the process of transcription, and is used in the process of translation to make proteins. o Transcription: The process by which genetic information is copied from DNA into mRNA. o Transfer RNA (tRNA): small molecules of RNA that transfer specific amino acids to a growing protein chain during protein synthesis (translation), using the information encoded by the mRNA. o Nucleotide triplets: Series of three nucleotides in a row, usually called a “codon,” that specifies the genetic code information for a particular amino acid when translating a gene into protein. For example, the nucleotide triplet or codon CCG codes for the amino acid proline (P). o Start codon: A three-nucleotide triplet or codon that tells the cell when to start protein translation. The start codon is ATG (or AUG in mRNA). o Stop codons: A three-nucleotide triplet or codon that tells the cell when to stop protein translation. The stop codons are TAA, TAG, and TGA (or UAA, UAG, and UGA in mRNA). o Protein: A type of molecule found in cells formed from a chain of amino acids encoded by genes. Proteins perform many of the functions needed in the cell. o Translation: The process by which mRNA is decoded to produce a protein. o Non-conservative: When amino acid changes observed between two or more sequences do not share the same side chain or R-group chemistry. o Conservative: When amino acid changes observed between two or more sequences do share the same side chain or R-group chemistry. o Because of the redundancy of the genetic code, in which multiple codons may encode the same amino acid, the DNA and corresponding protein sequence may appear to vary at different rates.] o mRNA transcripts contain “start” and “stop” codons that initiate and terminate protein translation, respectively. o DNA encodes proteins through the processes of transcription and translation. Amino acids are encoded by three nucleotide triplets called codons. The concepts of “start”
  • 3. 3 of 8 and “stop” codons. Proficiency in the use of the genetic code table, including the use of single letter amino acid abbreviations, is helpful. o Reading frame: A reading frame is a contiguous and non-overlapping sequence of three-nucleotide codons in DNA or RNA. There are three possible reading frames in an mRNA strand and six in a double-stranded DNA molecule (three reading frames from each of the two DNA strands). o Open reading frame: A reading frame that contains a start codon and a stop codon, with multiple three-nucleotide codons in between. The open reading frame in a particular region of DNA is the correct reading frame from which to translate the DNA into protein. The longest open reading frame is the one that’s most likely to be correct. o DNA sequences can be read in any of six possible reading frames, but only one of these is usually translated by the cell into a protein. This is called the open reading frame. o Genetic analyses and multiple sequence alignments can be generated with either DNA or protein sequences. o Common Misconceptions: Translation always starts with the 1st letter of a DNA sequence, or with the first start codon (AUG/ATG). All DNA codes for protein. Genes are found only on one of the strands of DNA. o In silico: An expression used to mean “performed on computer or via computer simulation.” Sequence alignment Alignments provide a powerful way to compare sequences for either evolutionary relatedness or structural/functional relatedness. Sequences can be compared by either global or local alignments. Global alignment forces complete alignment of the input sequences, whereas local alignments will only align their most similar segments. The choice of method will depend on whether the sequences are presumed to be related over their entire lengths or to only share isolated regions of homology. The algorithms to carry out global and local alignments are similar, but the statistics associated with the output are different. Homologs are structures or objects that share a common evolutionary origin. Objects with similar structure or function, but no common ancestor are analogs. Homologs can be classified as orthologs or paralogs. Orthologs are homologous genes from different species that arose from a common ancestor gene. Paralogs are homologous genes that are the result of gene duplication within an evolutionary lineage (eg., species) and may have different or similar functions. Searching databases Comparing a sequence against a database to discover similarities is one of the most frequently used and powerful tools in bioinformatics. In essence, searching a database is the same as aligning two sequences. However, to perform a pair-wise alignment of the query sequence with every sequence in the database would be too time consuming. Therefore programs use heuristic strategies to speed up the analysis are usually used. FASTA and BLAST are the two most commonly used programs for searching databases. Both programs use rapid exact match procedures to first identify sequences which are most likely to be related. These sequences are then subjected to further analyses
  • 4. 4 of 8 by alignment programs. BLAST (basic local alignment search tool) uses an approach based on matching short sequence fragments and then finds the best local alignments between the query sequence and the database sequences. Several BLAST programs are available for different types of sequence matching BLAST searches of the most current version of GenBank can be performed at the National Center for Biotechnology Information (NCBI) by cutting and pasting the query sequence onto their web page (http://www.ncbi.nlm.nih.gov/ BLAST/). The entire database or subdivisions of the database can be searched. It is also desirable to 'filter' low complexity sequences such as long runs of repetitive sequence (eg., tracts of poly-alanine, etc.) and most BLAST programs filter by default. Such regions of sequence can give spuriously high scores against unrelated proteins. Similarly, iterative searches are prone to contamination by regions corresponding to coiled-coils or transmembrane domains. These characteristics might match the initial search and the program then emphasizes these inappropriate characteristics. The user can also choose cut-off values (i.e., E-values), the number of matches to report, and the number of alignments to show. The entire database(s) is searched and sequences with similarity to the query sequence are identified and ranked accordingly to the alignment score derived from the pair- wise alignment. In addition to this 'raw or bit score', the output includes a 'statistical score' and the alignments. The statistical score, or expectation value (E-value), provides a measure of the expected number of sequences in the database that would achieve a given alignment score by chance. The E-values are more useful in terms of judging the significance of a match than the alignment score. E-values for proteins and DNA should be less than 0.1 and 0.0001, respectively, to be considered significant. Examining the alignments will also help in inferring whether the hit is significant. For example, protein global alignments which show 25% or greater sequence identity over a stretch of at least 80 amino acids should exhibit the same basic structure. The identified sequences can be also retrieved and subjected to further analyses. Database searches may assist in identifying an unknown gene or provide clues about its function. The normal approach is to compare the unknown sequence with a database of previously determined sequences, to identify the most closely related sequence or sequences from the database. This method relies on the fact that sequences of DNA and proteins both within and between species are related by homology; that is through having common evolutionary ancestors, and so proteins (and RNA and DNA motifs) tend to occur in families with related sequences, structures and functions. Two DNA or two protein
  • 5. 5 of 8 sequences will always show some similarity, if only by chance. The principle of searching for meaningful sequence similarity is to align two sequences, that is, array them one above the other with instances of the same base or amino acid at the same position. Multiple Sequence Alignments multiple sequence alignments involve the alignment of more than two sequences. The advantage of aligning multiple DNA or protein sequences is that evolutionary relationships between individual genes or sequences become clearer, and it becomes possible to identify important residues or motifs from their conservation between genes or proteins from different organisms (orthologs), or proteins of similar, but not identical function (homologs, paralogs, . The most commonly used multiple sequence alignment tool is called Clustal, and, as with the other bioinformatics tools, is available from many centers as a web- based application. The method works by initially carrying out pairwise alignments between all the sequences, as in the BLAST method described above, but then uses this information to gradually combine the sequences, beginning with most similar, and finishing with the most distant, optimizing the overall match of the sequences as it goes. Phylogenetic Trees The usefulness of sequence similarity searching is based on the fact that similar sequences are related through evolutionary descent. Hence, we can use the similarity between sequences to infer something about these evolutionary relationships. Since individual species have separated from each other at different times during evolution, their DNA and protein sequences should have diverged from each other and show differences that are broadly proportional to the time since divergence, in other words the sequences should form a molecular clock. The amount of sequence similarity can therefore be used to generate phylogenetic relationships, and phylogenetic trees . Structural Bioinformatics The mismatch between the number of known protein sequences and the number of determined three-dimensional structures has led to the development of methods to derive structure direct from sequence. These include secondary structure prediction and comparative or homology modeling, which uses sequence similarity between an unknown and a known structure to derive a plausible structure for the new protein. Proteomics The gene sequences provide a limited amount of information about the proteins that are encoded by the genes. In addition, phenotype will be determined by the proteins and not the genes (i.e., genotype). Efforts are now being directed towards the characterization of the proteome (the complete set of proteins found in a cell, tissue or organism). Proteomics encompasses many types of activities and analyses with a goal of having a complete understanding of protein function and to apply this knowledge. One objective of proteomics is to identify the complete complement of proteins. Genomics provides a first step in this process since all of the genes within a genome can potentially be identified.
  • 6. 6 of 8 Coding or non-template DNA strand 5’ ATG CAT AGC ATG AGG ATC AT 3’ Complement 3’ TAC GTA TCG TAC TCC TAG TA5’ mRNA 5’ AUG CAU AGC AUG AGG AUC AU 3’ Protein N- M H S M R I -C Reverse 5’ TAC TAG GAG TAC GAT ACG TA 3’ Reverse complement 5’ ATG ATC CTC CTG CTA TGC AT 3’ Composition A=7 : T=5: C=3: G=5 AT =60%: GC=40%:length 20