This document discusses various bioinformatics tools used for genomics, proteomics, and metabolomics. It begins with an introduction to bioinformatics and defines key terms. It then describes several important databases for nucleotide and protein sequences including NCBI, GenBank, and KEGG. Important analytical tools like BLAST and Clustal are also mentioned. Subsequent chapters discuss genomics, proteomics, and metabolomics in more detail and provide examples of specific tools used for each including KNApSAcK, MetaboAnalyst, and PSI-PRED. The document aims to outline the key concepts and computational tools involved in these three areas of bioinformatics.
5. Bioinformatics And
Tools
As an interdisciplinary field of science, bioinformatics
combines biology, computer science, information
engineering, mathematics and statistics to analyze and
interpret the biological data
1
The term “Bioinformatics” was initially coined by
Ben Hesper and Paulien Hogewen in 1970 and
defined as “the study of informatics processes in
biotic systems”.
2
Bioinformatics tools are software programs
that are designed for extracting the
meaningful information from the mass of
molecular biology / biological databases & to
carry out sequence or structural analysis.
3
6. Bioinformatics Tools
The European Molecular Biology
laboratory (EMBL), the National
Center for Biotechnology
Information (NCBI), and the
DNA Databank of Japan (DDBJ)
have been catering to the needs
of the researchers around the
globe for decades, and the
databases and tools hosted by
these institutes are continually
growing at a rapid pace
The Kyoto Encyclopaedia of
Genes and Genomes (KEGG)
attempts to understand higher-
order biological functions by
integrating gene, protein, and
metabolic pathway information.
On the other hand, databases
like GenBank, Phytozome, the
EMBL Nucleotide Sequence
Database, SwissProt, and
Uniprot Knowledgebase, etc.
store huge amounts of
nucleotide and protein
sequence information that
are readily accessible to the
public.
Analytical tools
such as BLAST and
C L U S TA L h a v e
b e e n t h e
workhor ses for
s e q u e n c e d a t a
8. Genomics
Genomics is the study of whole genomes of organisms,
and incorporates elements from genetics.
1
Genomics uses a combination of recombinant DNA, DNA
sequencing methods, and bioinformatics to sequence,
assemble, and analyse the structure and function of
genomes.
2
Genomics is an interdisciplinary field of biology focusing
on the structure, function, evolution, mapping, and
editing of genomes
3
A genome is an organism's complete set of DNA,
including all of its genes.
4
9. • Genomics uses recombinant DNA technology to analyze the structure
and function of the complete set of DNA within an organism.
• This includes using electrophoresis and purification systems to isolate
DNA templates, PCR and sequencing to determine the sequence and
map of the DNA base code, microarrays and genotyping to determine
the similarity and differences between sequences, mass spectrometry
for analysis of oligonucleotides and next generation sequencers to
analyze whole genomes.
• Most laboratories will use some kind of genomic tool in their research,
clinical or forensic applications.
Genomics Tools
10. Database
"A biological database is a large, organized body of persistent data,
usually associated with computerized software designed to update,
query, and retrieve components of the data stored within the
system."
Examples :
NCBI
EMBL
DDBJ
FASTA
FASTA format is a text-based format for representing either
nucleotide sequences or amino acid (protein) sequences, in which
nucleotides or amino acids are represented using single-letter
codes.
11. NCBI
• The NCBI houses a series of
databases relevant to biotechnology
and biomedicine and is an important
resource for bioinformatics tools and
services.
• Major databases include GenBank
for DNA sequences and PubMed, a
bibliographic database for
biomedical literature.
• Other databases include the NCBI
Epigenomics database. All these
databases are available online
through the Entrez search engine.
12. GenBank
GenBank® is a comprehensive
database that contains publicly
available nucleotide sequences for
more than 300 000 organisms named
at the genus level or lower, obtained
primarily through submissions from
individual laboratories and batch
submissions from large-scale
sequencing projects, including whole
genome shotgun
13. BLAST is an acronym for Basic Local Alignment Search Tool and refers to a suite of
programs used to generate alignments between a nucleotide or protein sequence,
referred to as a “query” and nucleotide or protein sequences within a database,
referred to as “subject” sequences.The Basic Local Alignment Search Tool (BLAST)
finds regions of similarity between sequences. The program compares nucleotide
or protein sequences and calculates the statistical significance of matches.
BLAST
14. Types of BLAST
p r o t e i n s e q u e n c e
s e a r c h e d a g a i n s t
translated nucleotide
sequences
tBLASTn
(Protein BLAST): compares
one or more protein query
sequences to a subject
protein sequence or a
d a t a b a s e o f p r o t e i n
sequences.
BLASTp
(translated nucleotide
s e q u e n c e s e a r c h e d
a g a i n s t p r o t e i n
sequences)compares a
nucleotide query against
a database of protein
sequences.
BLASTx
(Nucleotide BLAST): compares
one or more nucleotide query
s e q u e n c e s t o a s u b j e c t
nucleotide sequence or a
d a t a b a s e o f n u c l e o t i d e
sequences
BLASTn
15. Genome Data Veiwer
The NCBI Genome Data
Viewer (GDV) is a genome
browser supporting the
exploration and analysis
of eukaryotic RefSeq
genome assemblies.
Finding organisms and
genome assemblies. Using
the browser.
17. • Proteomics: A biotechnology branch concerned with applying the
technique of molecular biology, biochemistry, and genetics to
analyzing the structure, function of proteins produced by the
genes of a particular cell, tissue, or organisms, with organizing the
information in databases.
• It is the study of the full complement of proteins at a given time.
Organisms have one genome but multiple genome.
• Proteomics is a multidisciplinary of biology, bioinformatics,
molecular biology, analytical chemistry, and proteins biochemistry.
• Types of proteomics:
1. Intraction proteomics
2. Expression proteomics
Proteonomics
18. Some Modelling tools and
databases:
SwissM
odel
MODELLER PROSPECT
InterPro
ScanBLASTModBase
Add your words here,according to your need to draw the text box size
19. Proteins Computional analysis:
1. First step: Proteins
sequence, we can
take proteins
sequence from
Database or can
translate gene
sequence into
proteins in both cases,
we should ultimately
used FASTA format.
How to get Proteins
sequence from
Database??
We are going to take
example of largest
and most frequently
used NCBI nucleotide
and protein database.
There are two types
of format,
1.GenBank
2.FASTA
2.Second step: In
this step , we will
try to figure out
physio-chemical
properties of our
target proteins,
including
Molecular weight,
Iso- electric point,
stability etc.
Now we will use
very popular tool
here , named
Expasy
Protoparam.Now
input is protein
sequence in FASTA
format, instead of
GenBank format.
👉 👉 👉
20. Prediction of Secondary structure:
Prediction of Secondary structure:
There is three different structural forms of proteins, primary,
secondary, tertiary structure. In secondary structure, there will be
different types of element, alpha helix, beta sheets, and loops.
Tow different types of alogrithms used to predict secondary
structure elements.
1. Ab-initio Based
2. Homology based
21. Proteins structure analysis:
Ab-initio alogrithms are stand alone alogrithms, that
identifying the secondary structure elements of using
intrinsic tendencies of Amino acids to be particular
confirmation. For example, glycine and proline, they
love to stay in loops only.
Homology based alogrithms make prediction based
on Secondary structure of homologous Now structure
is more conserved than sequence.
PSI-PRED; A homology based tool:
We use PSI-PRED a homology tool to predict
secondary structure.
22. Single peptide prediction:
Single peptide is defining localization of proteins in a cells. Present at
N- terminal of newly synthesized proteins predominantly
hydrophobic Amino acids important to predict especially if we want
to colon the gene into expression system. We use same alogrithms
for prediction of signal peptide and input sequence is FASTA format.
Is our proteins is Transmembrane??
In cell, proteins are either globular or Transmembrane.
Transmembrane proteins can be Transmembrane helicle and beta-
barrels.
We use TMHMM alogrithms for the prediction of Transmembrane
proteins structure prediction.
Prediction of Domain's and motifs:
23. InterPro is a tool that's used for prediction of Domain's in protein
structure. And input sequence will be in a FASTA format. And InterPro
using homology search to identifying domain and motifies.
3D structure prediction of proteins:
Proteins structure prediction is the inference of three- dimensional
structure of a protein from its Amino acids sequence, that is the
prediction of its folding and it's secondary structure and tertiary
structure from its primary structure. Proteins structure prediction is one
of the most important goals pursued by bioinformatics and theoretical
chemistry, it's is highly important in medicine for example in drug
design and biotechnology in design of novel enzymes.
24. Proteins 3D analysis:
3D structure prediction is one of the most complicated
Computional process, there are three experimental
techniques to determine 3D structure of proteins
X-ray crystallography
NMR
Cryo-Electron Microscopy
26. • The scientific study of the set of metabolites present within an
organism, cell, or tissue.
• "the potential of metabolomics in the early detection of cancer is also
being explored".
• Metabolome refers to the sum of all low molecular weight
metabolites of a cell during a specific physiological period, and is an
integral part of a comprehensive understanding of a biological system.
If genomics and proteomics tell us about possible events, then
metabolomics can tell us what actually happened.
Metablomics
27.
28. Metablomics Tool
Here are some metabolic tools
1. KNApSAck
2. Metaboanalyst
3. MapMan
4. LipidMap
KNApSAck:
The purpose of the KNApSAcK Metabolomics is to search
metabolites from MS peak, molecular
weight and molecular formula, and species. It consists of
KNApSAcK Metabolomics Search Engine and KNApSAcK Core
System.
29. KNApaSAck
The KNApSAcK package
when installed in the user's
computer provides tool for
analyzing his/her own
datasets of mass spectra
that are prepared according
to a particular format, as
well as for retrieving
information on metabolites
by entering the name of a
metabolite, the name of an
organism, molecular weight
or molecular formula.
30. Metaboanalyst
MetaboAnalyst is a set of
online tools for
• metabolomic data analysis
and
• interpretation
Created by members of the
Wishart Research Group at
the University of Alberta. It
was first released in May
2009 and version 2.0 was
released in January 2012