2.
Science of collecting, analyzing and conceptualizing
biological data by implication of informatics techniques.
2
Bioinformatics
Biology
Informa-
tics
Bioinformatics
4.
Manage biological information
organize biological information using databases
Process, analyze, and visualize biological data
Share biological information to the public using the Internet.
4
Goals of Bioinformatics
5.
Bio – informatics
Bioinformatics is conceptualizing biology in terms of
molecules (in the sense of physical-chemistry)
applying “informatics” techniques (derived from
disciplines such as applied math, CS, and statistics)
to understand and organize the information
associated with these molecules, on a large-scale.
Bioinformatics is a practical discipline with many
applications.
5
Definition
7.
7
Biological Information
Central Dogma
of Molecular Biology
DNA
-> RNA
-> Protein
-> Phenotype
-> DNA
Molecules
Sequence, Structure, Function,
Interaction
Processes
Mechanism, Specificity,
Regulation
Central Paradigm
for Bioinformatics
Genomic Sequence Information
-> mRNA (level)
-> Protein Sequence
-> Protein Structure
-> Protein Function
-> Protein Interaction
-> Phenotype
Large Amounts of Information
Statistical
Computer Processing
10.
Could not have been achieved without bioinformatics
Goals
3 billion DNA subunits
Discover all the human genes
Make them accessible for further biological study
then ?
Need to bring together and store vast amounts of information
from
Lab equipment and experiments
Computer Analysis
Human Analysis
Make visible to the world’s scientists 10
Human genome project
11.
11
How to analyze
information
Data
–Management.
–Analysis.
–Derive Hypothesis.
–Design and Implement an in silico experiment.
–Confirm in the wet lab.
12.
Find an answer quickly
Most in silico biology is faster than in vitro
2. Massive amounts of data to analyze
Need to make use of all information
Not possible to do analysis by hand
Can’t organize and store information only using lab note
books•
Automation is key
However!
Verification ?
12
Why bioinformatics
13.
1. Computational biology-
Computing methods for classical biology
Primarily concerned ----> Evolutionary, population and
theoretical biology,
Cellular/Molecular biology ?
2. Medical informatics-
Computing methods to improve communication,
understanding, and management of medical data
Data Manipulation
Applications
14.
3. Chemo -informatics
Chemical and biological technology, for drug design
and development
4. Genomics
Analysis and comparison of the entire genome of a
single species or of multiple species
Genomics existed before any genomes were
completely sequenced, but in a very primitive state
Continued…
15.
5. Proteomics
Study of how the genome is expressed in proteins, and of
how these proteins function and interact
Concerned with the actual states of specific cells, rather
than the potential states described by the genome
6. Pharmacogenomics
The application of genomic methods to identify drug
targets
For example, searching entire genomes for potential drug
receptors, or by studying gene expression patterns in
tumors
Continued….
16.
7. Pharmacogenetics :
The use of genomic methods to determine what
causes variations in individual response to drug
treatments
The goal is to identify drugs that may be only be
effective for subsets of patients, or to tailor drugs for
specific individuals or groups
19.
A gene is characterized by several features
(promoter, ORF…)
some are easier and some harder to detect…
19
How do we identify a gene
in a genome?
21.
21
Comparison between the full drafts of the human and chimp
genomes revealed that they differ only by 1.23%
How humans
are chimps?
Perhaps not surprising!!!
22.
So where are we different ??
22
Human ATAGCGGGGGGATGCGGGCCCTATACCC
Chimp ATAGGGG - - GGATGCGGGCCCTATACCC
Mouse ATAGCG - - - GGATGCGGCGC -TATACCA
24. 24
The protein three dimensional structure can tell
much more than the sequence alone
Protein-ligand complexes
Functional sites
fold Evolutionary
relationship
Shape and electrostatics
Active sites
protein complexes
Biologic processes
25.
The different types of data are collected in database
Sequence databases
Structural databases
Databases of Experimental Results
All databases are connected
25
Resources and Databases
27.
3-dimensional structures of proteins, nucleic acids,
molecular complexes etc
3-d data is available due to techniques such as NMR
and X-Ray crystallography
27
Structure Databases
28.
Data such as experimental microarray images- gene
expression data
Proteomic data- protein expression data
Metabolic pathways, protein-protein interaction
data, regulatory networks
28
Databases of Experimental
Results
29.
29
PubMed
Service of the National Library of Medicine
http://www.ncbi.nlm.nih.gov/pubmed/
Literature Databases
30.
Each Database contains specific information
Like other biological systems also these databases are
interrelated
30
Putting it all Together
32.
Applications I-- Genomics
Finding Genes in Genomic DNA
introns
exons
Promotors
Characterizing Repeats in Genomic DNA
Statistics
Patterns
Expression Analysis
Time Course Clustering
Identifying regulatory Regions
Measuring Differences
• Genome Comparisons
Ortholog Families
Genome annotation
Evolutionary Phylogenetic
trees
• Characterizing Intergenic
Regions
Finding Pseudo genes
Patterns
• Duplications in the Genome
Large scale genomic
alignment
33.
Application II-
Protein
Sequence
Sequence Alignment
non-exact string matching,
gaps
How to align two strings
optimally via Dynamic
Programming
Local vs Global Alignment
Suboptimal Alignment
Hashing to increase speed
(BLAST, FASTA)
Amino acid substitution
scoring matrices
Multiple Alignment and
Consensus Patterns
How to align more than one
sequence and then fuse the
result in a consensus
representation
Transitive Comparisons
HMMs, Profiles
Motifs
Scoring schemes and
Matching statistics
How to tell if a given
alignment or match is
statistically significant
A P-value (or an e-value)?
Score Distributions
(extreme val. dist.)
Low Complexity Sequences
Evolutionary Issues
Rates of mutation and change
34.
Application
III-- Protein
Structure
Secondary Structure
“Prediction”
via Propensities
Neural Networks, Genetic
Algorithm.
Simple Statistics
Trans Membrane Regions
Assessing Secondary Structure
Prediction
Tertiary Structure Prediction
Fold Recognition
Threading
Ab initio
Function Prediction
Active site identification
Relation of Sequence Similarity to
Structural Similarity
36.
Overall Occurrence of a
Certain Feature in the
Genome
e.g. how many kinases in
Yeast
Compare Organisms and
Tissues
Expression levels in
Cancerous vs Normal
Tissues
Databases, Statistics
Example Application IV:
Overall Genome Characterization