SlideShare uma empresa Scribd logo
1 de 79
SCIENCE PROJECT

How can Science help us Eradicate Disease?
Genomes to Hit Molecules in Silico:
A Country Path Today, A Highway Tomorrow
By utkARSh veRMA
KVJNU
Supercomputing Facility for Bioinformatics & Computational Biology IITD

COST & TIME INVOLVED IN DRUG DISCOVERY
Target Discovery
2.5yrs

4%

Lead Generation
Lead Optimization
3.0yrs

15%

Preclinical Development
1.0yrs

10%

Phase I, II & III Clinical Trials
6.0yrs

68%

FDA Review & Approval
1.5yrs

3%

Drug to the Market
14 yrs

$1.4 billion

Source: PAREXEL‟s Pharmaceutical R&D Statistical Sourcebook, 2001, p96.; Hileman, Chemical Engg. News, 2006,
84, 50-1.
Supercomputing Facility for Bioinformatics & Computational Biology IITD

CELL

TISSUE

ORGAN

ORGANISM
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Conjugate
rule acts as a
good
constraint on
the „z‟
coordinate of
chemgenome
or one can
simply use
+1/-1 as in
the adjacent
table for „z‟

TTT Phe -1
GGT Gly +1
TAT Tyr -1
GCT Ala +1
TTC Phe -1
GGC Gly +1 TAC Tyr -1
GCC Ala +1
TTA Leu -1
GGA Gly +1 TAA Stop -1
GCA Ala +1
TTG Leu -1
GGG Gly +1 TAG Stop -1 GCG Ala +1
ATT Ile -1
CGT Arg +1
CAT His +1
ACT Thr -1
ATC Ile +1
CGC Arg -1
CAC His -1
ACC Thr +1
ATA Ile +1
CGA Arg -1
CAA Gln -1
ACA Thr +1
ATG Met -1
CGG Arg +1 CAG Gln +1 ACG Thr -1
TGT Cys -1
GTT Val +1
AAT Asn -1
CCT Pro +1
TGC Cys -1
GTC Val +1
AAC Asn +1 CCC Pro -1
TGA Stop -1 GTA Val +1
AAA Lys +1
CCA Pro -1
TGG Trp -1
GTG Val +1
AAG Lys -1
CCG Pro +1
AGT Ser -1
CTT Leu +1
GAT Asp +1 TCT Ser -1
AGC Ser +1
CTC Leu -1
GAC Asp +1 TCC Ser -1
AGA Arg +1 CTA Leu -1
GAA Glu +1 TCA Ser -1
AGG Arg -1
CTG Leu +1 GAG Glu +1 TCG Ser -1
Extent of Degeneracy in Genetic Code is captured by Rule of Conjugates:
A1,2 is the conjugate of C1,2 & U1,2 is the conjugate of G1,2:(A2 x C2 & G2 x U2)
With 6 h-bonds at positions 1 and 2 between codon and anticodon, third base is inconsequential
With 4 h-bonds at positions 1 and 2 third base is essential
With 5 h-bonds middle pyrimidine renders third base inconsequential;
middle purine requires third base.
B. Jayaram, "Beyond Wobble: The Rule of Conjugates", J. Molecular Evolution, 1997, 45, 704-705.
Supercomputing Facility for Bioinformatics & Computational Biology IITD

+

c

ω

Y
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Comparative Genomics/Proteomics for
Drug Target Identification

c

Drug Target = H  I
H = Human Genome / Proteome (Healthy Individual)
I = Genome / Proteome of the Invader / Pathogen

*Present: 1600 drugs (Drugbank) targeted to
~1000 (500 human + 500 pathogen) targets
*Andersen et al., Nature Reviews Drug Discovery, 10, 579 (2011)
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Target Directed Lead Molecule Design

Active Site
From Genome to Drug :
Sketching a Computational Pathway for Modern Drug Discovery

ChemGenome

Drug

DNA

Sanjeevini

mRNA

SEEARTINSCIENCE……
Polypeptide Sequence

Genome

Bhageerath
Tertiary Structure

Protein
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Hepatitis B virus (HBV) is a major blood-borne pathogen worldwide. Despite the
availability of an efficacious vaccine, chronic HBV infection remains a major
challenge with over 350 million carriers.
No.

HBV ORF

Protein

Function

1

ORF P

Viral polymerase

DNA polymerase, Reverse transcriptase
and RNase H activity[36,48].

2

ORF S

HBV surface proteins Envelope proteins: three in-frame start
(HBsAg, pre-S1 and codons code for the small, middle and
pre-S2)
the large surface proteins[36,49,50]. The
pre-S proteins are associated with virus
attachment to the hepatocyte[51]

3

ORF C

Core protein
HBeAg

4

ORF X

HBx protein

and HBcAg: forms the capsid [36].
HBeAg: soluble protein and its biological
function are still not understood.
However, strong epidemiological
associations with HBV replication[52] and
risk for hepatocellular carcinoma are
known[42].
Transactivator; required to establish
infection in vivo[53,54]. Associated with
multiple steps leading to
hepatocarcinogenesis[45].
Supercomputing Facility for Bioinformatics & Computational Biology IITD

United States FDA approved agents for anti-HBV therapy

Agent
Interferon alpha
Peginterferon
alpha2a
Lamivudine
Adefovir dipivoxil
Tenofovir
Entecavir
Telbivudine

Mechanism of action / class of drugs
Immune-mediated clearance
Immune-mediated clearance
Nucleoside analogue
Nucleoside analogue
Nucleoside analogue
Nucleoside analogue
Nucleoside analogue

Resistance to nucleoside analogues have been reported in over 65% of
patients on long-term treatment. It would be particularly interesting to
target proteins other than the viral polymerase.
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Input the HBV Genome sequence to ChemGenome

Hepatitis B virus, complete genome
NCBI Reference Sequence: NC_003977.1
>gi|21326584|ref|NC_003977.1| Hepatitis B virus, complete genome
ChemGenome 3.0 output
Five protein coding regions identified
Gene 2 (BP: 1814 to 2452) predicted by the ChemGenome 3.0 software
encodes for the HBV precore/ core protein (Gene Id: 944568)
Supercomputing Facility for Bioinformatics & Computational Biology IITD

>gi|77680741|ref|YP_355335.1| precore/core protein
[Hepatitis B virus]
MQLFPLCLIISCSCPTVQASKLCLGWLWGMDIDPYKE
FGASVELLSFLPSDFFPSIRDLLDTASALYREALESPEH
CSPHHTALRQAILCWGELMNLATWVGSNLEDPASREL
VVSYVNVNMGLKIRQLLWFHISCLTFGRETVLEYLVS
FGVWIRTPPAYRPPNAPILSTLPETTVVRRRGRSPRRR
TPSPRRRRSQSPRRRRSQSRESQC
Input Amino acid sequence to Bhageerath-H
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Input Protein Structure to Active site identifier (ASF/Sanjeevini)
10 potential binding sites identified
Scan a million compound library
RASPD/Sanjeevini calculation with an average cut off binding affinity to
limit the number of candidates. (Empirical scoring function which builds
in Lipisnki‟s rules and Wiener index)
RASPD output
2057 molecules were selected with good binding energy from one million
molecule database corresponding to the top 5 predicted binding sites.
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Out of the 2057 molecules, top 40 molecules are given as input to
ParDOCK/Sanjeevini for atomic level binding energy calculations. Out of this
40, (with a cut off of -7.5 kcal/mol), 24 molecules are seen to bind well to
precore/core protein target. These molecules could be tested in the Laboratory.
Mol. ID
0001398
0004693
0007684
0007795
0008386
0520933
0587461
0027252
0036686
0051126
0104311
0258280
0000645
0001322
0001895
0002386
0003092
0001084
0002131
0540853
1043386
0088278
0043629
0097895

Binding Energy (kcal/mol)
-10.14
-8.78
-10.05
-9.06
-8.38
-8.21
-10.22
-8.39
-8.33
-8.73
-9.3
-7.8
-7.89
-8.23
-9.49
-8.53
-8.35
-8.68
-8.07
-11.08
-10.14
-9.16
-7.5
-8.04
Supercomputing Facility for Bioinformatics & Computational Biology IITD

24 hit molecules for precore/core protein target of HBV
Supercomputing Facility for Bioinformatics & Computational Biology IITD

From Genome to Hits

Genome

X Teraflops
Chemgenome
Bhageerath
Sanjeevini

Hits

Anjali Soni, K. M. Pandey, P. Ray, B. Jayaram, “Genomes to Hits in Silico: A Country Path Today, A
Highway Tomorrow: A case study of chikungunya”, Current Pharmaceutical Design, 2013, 19, 46874700, DOI: 10.2174/13816128113199990379.
ChemGenome
A Physico-Chemical Model for identifying signatures of functional units on Genomes

Protein-nucleic acid interaction
propensity parameter
Stacking Energy

Gene

Non Gene

Protein-Nucleic acid
interaction parameter

Hydrogen Bonding

Stacking Energy
Physico-chemical fingerprints of RNA Genes

Varsha Singh, Ankita Singh, Kritika Karri, Garima Khandelwal, B. Jayaram, to be submitted, 2014.
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Distinguishing Genes (blue) from Non-Genes (red)
in ~ 900 Prokaryotic Genomes
A

B

C

D

E

F

Three dimensional plots of the distributions of gene and non-gene direction vectors for six best
cases (A to F) calculated from the genomes of
(A) Agrobacterium tumefaciens (NC_003304), (B) Wolinella succinogenes (NC_005090),
(C) Rhodopseudomonas palustris (NC_005296), (D) Bordetella bronchiseptica (NC_002927),
(E) Clostridium acetobutylicium (NC_003030), (F) Bordetella pertusis (NC_002929)

Poonam Singhal, B. Jayaram, Surjit B. Dixit & David L. Beveridge, Molecular Dynamics Based
Physicochemical Model for Gene Prediction in Prokaryotic Genomes, Biophys. J., 2008, 94, 4173-4183.
The ChemGenome2.0 WebServer
http://www.scfbio-iitd.res.in/chemgenome/chemgenomenew.jsp
FTP Directory: ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/CHR_X/
DEFINITION Homo sapiens chromosome X (196 MB)
ORIGIN
1 ccaggatggt ccttctcctg aaggttaatc cataggcaga tgaatcggat attgattcct
61 gttcttggaa taatctagag gatctttaga atccattggg attcataatc acagctatgc
121 cgatgccatc atcaccggct tagccctttc tgaaaacaca gtcatcatct acccccattg
181 gaatcacgat gcaaaaaacc tgtcccaaag cggtggtttc ctatgtgatt cttgcatcca
241 ggacaaatga cagtcagcag agaggcgccc tgttccatct tttggtttga tccagttaaa
301 ggcacacacg tgagcaccca acgtttgcca actcagcact gggcagagcc tggcctctga
361 ggaaattggc atcttcgtaa tcaatatatt attatgtttt attgaaatgt aagtcattgc…..

DEFINITION Homo sapiens chromosome Y (25 MB)
ORIGIN
1 ggtttcacca agttggccag gctggtctcg aactcctgac ctcaggtgat ctgtccacct
61 cggtgtccca aagtgctggg attacaggtg tgaaccacca cacccagcct catgtaatac
121 ttaaaaatga actacaggtg gattacaaac ctgaatatca aagaaaactt ttttttttga
181 aaaatagagg gaaatgtctt ataacctcag agttaggagg tttttcttag atacaataca
241 aaaagcataa ccacgcccat agtcccagct actcaggagg ctgaggcata agaatcactt
301 gagctcgaga ggtggaggtt gcagtgagcc gagatcctgc cattgcactc cagctgaggc
361 tacagagtga gagtataaaa aaaaaaaaaa aagcataacc tttaaaaatg ggttagccta
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Let us read the book of Human Genome soon like a Harry Potter novel !

Human Genome
3000 Mb

Gene & Gene related Sequences

Extra-genic DNA
2100 Mb

900 Mb

Coding DNA
90 Mb (3%) !!!

Non-coding DNA

Repetitive DNA

Unique & low copy number

420 Mb

1680 Mb

810 Mb

Tandemly repeated DNA
Satellite, micro-satellite, mini-satellite DNA

Interspersed genome wide repeats

LTR elements, Lines, Sines, DNA Transposons
The Grand Challenge of
Protein Tertiary Structure Prediction

The Bhageerath Pathway
-------GLU ALA GLU MET LYS ALA SER
VAL THR VAL LEU THR ALA LEU GLY ALA
HIS GLU ALA GLU LEU LYS PRO LEU ALA
LYS ILE PRO ILE LYS TYR LEU GLU PHE
LEU HIS----

GLU
ILE
GLN
ILE

ASP
LEU
SER
SER

LEU
LYS
HIS
GLU

LYS
LYS
ALA
ALA

LYS
LYS
THR
ILE

HIS
GLY
LYS
ILE

GLY
HIS
HIS
HIS
Supercomputing Facility for Bioinformatics & Computational Biology IITD

The 20 amino acids differ from each other in the side chain (shown in pink)
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Proteins are polymers of amino acids with unlimited
potential variety of structures and metabolic activities.
1.
2.
3.
4.
5.
6.
7.
8.
9.

Structural proteins
Protective proteins
Contractile proteins
Transport proteins
Storage proteins
Toxins
Hormones
Enzymes
.......
Supercomputing Facility for Bioinformatics & Computational Biology IITD

+

c

ω

Y

http://www.rcsb.org/pdb/home
Biological Macromolecular Resource
Supercomputing Facility for Bioinformatics & Computational Biology IITDelhi

Protein Folding Problem

Grand
Challenge
NP Complete
(hard)
problem.

Protein Structure
Prediction is an open
challenge in life sciences /
physical sciences and
Computer Science.
“Native structure” at the bottom of the
free energy Funnel. Thermodynamic
hypothesis of Anfinsen
Supercomputing Facility for Bioinformatics & Computational Biology IITD

WHY FOLD PROTEINS ?
Proteins
Hormones & factors
DNA & nuclear receptors
Ion channels
Unknown

“Proteins”- Majority of Drug Targets

•
•

Structure-based drug-design
Mapping the functions of proteins in
metabolic pathways.
Experimental Approach

Computational approaches

X-ray crystallography

NMR
spectroscopy

Comparative methods

Ab initio methods

Cost

> $100K

~ $1M

Thousands of dollars

??

Time

at least 6 months

at least 6 months

Days

Accuracy

very high

very high

Days
Depend on the similarity of
homologous template.

Moderate

Prone to failure as crystallizing a
Prone to failure,
Require a homologous
Sampling and scoring
protein is still like an art and many
and is only
template with at least 30%
limitations
Limitation proteins (e.g. membrane) cannot be
applicable to
similarity. The accuracy is
crystallized.
small proteins significantly reduced when
(<150 amino
the similarity is low.
acids).
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Computational Requirements for ab initio Protein Folding
Strategy A

• Generate

all possible conformations
and find the most stable one.

Strategy B

• Start with a straight chain and solve
F = ma to capture the most stable state

• For a protein comprising 200 AA

• A 200 AA protein evolves

assuming 2 degrees of freedom per AA

~ 10-10 sec / day / processor

• 2200

• 10-2 sec => 108 days

Structures => 2200 Minutes to
optimize and find free energy.
2200 Minutes = 3 x 1054 Years!

~ 106 years
With a million processors ~ 1 year

Anton machine is making „Strategy B‟ viable for small proteins: David E. Shaw, Paul Maragakis,
Kresten Lindorff-Larsen, Stefano Piana, Ron O. Dror, Michael P. Eastwood, Joseph A. Bank, John M.
Jumper, John K. Salmon, Yibing Shan, and Willy Wriggers, "Atomic-Level Characterization of the
Structural Dynamics of Proteins," Science, vol. 330, no. 6002, 2010, pp. 341–346.
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Bhageerath Strategy: A Case Study of Mouse C-Myb
DNA Binding (52 AA)
LIKGPWTKEEDQRVIELVQKYGPKRWSVIAKHLKGRIGKQCRERWHNHLNPE

Sequence

Preformed Secondary Structure

16384 Trial Structures
Biophysical Filters & Clash Removal
10632 Structures

Energy Scans

RMSD=2.87, Energy Rank=1774

RMSD=4.0 Ang, Energy Rank=4
Blue: Native; Red: Predicted
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Development of a homology / ab initio hybrid server for proteins of all sizes
Bhageerath-H Protocol
Overall strategy:
(1) Generate
several
plausible
candidate
structures by
a mix of
methods &
(2) Score them
to realize
near-native
structures

B. Jayaram, Priyanka Dhingra, Baharat Lakhani, Shashank Shekhar, “Bhageerath: Attempting the Near
Impossible – Pushing the Frontiers of Atomic Models for Protein Tertiary Structure Prediction”, J Chemical
Supercomputing Facility for Bioinformatics & Computational Biology IITD

All the (eight) modules of the protocol are currently implemented on a dedicated
280 AMD Opteron 2.4 GHz processor cluster (~3 teraflops).
Jayaram et al., Bhageerath, Nucl. Acid Res., 2006, 34, 6195-6204
Supercomputing Facility for Bioinformatics & Computational Biology IITD

R,B,P,G
Reexamining the
language of
amino acids
(1) RRR
(2) RRB
(3) RRP
(4) RRG

(5) BBR
(6) BBB
(7) BBP
(8) BBG

R,B,P,G

R,B,P,G

64 coloured triangles are
possible. By virtue of the
symmetries of the triangle,
only 20 of these are unique.

(9) PPR
(13) GGR
(10) PPB
(14) GGB
(11) PPP
(15) GGP
(12) PPG
(16) GGG
Some observations
I. Any color occurs in exactly 10 triangles
R (1,2,3,4,5,9,13,17,18,19); B (2,5,6,7,8,10,14,17,18,20);
P (3,7,9,10,11,12,15,18,19,20); G (4,8,12,13,14,15,16,17,19,20)
II. Any two distinct colors occur together in 4 triangles
R & B (2,5,17,18); R & P (3,9,18,19); R & G (4,13,17,19)
B & P (7,10,18,20); B & G (8,14,17,20) ; P & G (12,15,19,20)
III. Any three distinct colors occur together in only one triangle
R, B & G (17); R, B & P (18); R, P & G (19); B, P &G (20)
IV. All sides with same color occurs only once
R (1); B (6); P (11); G (16)

(17) RBG
(18) RBP
(19) RPG
(20) BGP
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Rule 1. Amino acid side chains have evolved based on four chemical
properties. A minimum of one and a maximum of three properties are used
to specify each amino acid.
Rule 2. Each property occurs in exactly 10 amino acids.
Rule 3. Any two properties occur simultaneously in only four amino acids.
Rule 4. Any three properties occur simultaneously in only one amino acid.
Rule 5. Amino acids characterized by a single property occur only once.
Text book classifications do not satisfy the above rules!
Either the above rules are irrelevant to amino acids or
we need to revise our understanding of the language of proteins.
Jayaram, B.. Decoding the Design Principles of Amino Acids and the Chemical Logic of Protein
Sequences. Available from Nature Precedings. http://hdl.handle.net/10101/npre.2008.2135.1 200
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Property (I): Presence of sp3 hybridized γ carbon atom. (a) Exactly 10 amino
acids {E, I, K, L, M, P, Q, R, T, V} possess this property as required by Rule 2
above.

Property (II): Hydrogen bond donor ability. (a) Exactly 10 amino acids {C, H, K,
N, Q, R, S, T, W, Y} possess this property. (b) Also, only four amino acids (K, Q,
R, T) exhibit both properties (I & II together) as required by Rule 3.
Property (III): Absence of δ carbon. (a) Exactly 10 amino acids {A, C, D, G, I,
M, N, S, T, V} have this property. Ile is included in this set as one of the
branches of its side chain is lacking in a δ carbon. (b) I and III occur
simultaneously in only four amino acids (I, M, T, V) and similarly II and III
occur simultaneously in only four amino acids (C, N, S, T). (c) Rule 4 requires
that the above three properties (I, II and III) occur simultaneously in only one
amino acid (T) and this conforms to the expectation.
Supercomputing Facility for Bioinformatics & Computational Biology IITD

The most likely candidate for property (IV): Absence of branching.
Linearity of the side chains / non-occurrence of bidentate forks with
terminal hydrogens in the side chains. (a) This pools together 10
amino acids in the set {A, D, E, F, H, K, M, P, S, Y}. Side chains with
single rings are treated as without forks. The sulfhydryl group in
Cys and its ability to form disulfide bridges requires it to be treated
as forked. Accepting that this property (IV) satisfies Rule 2, (b) Rule
3 is satisfied by I and IV (E, K, M, P); by II and IV (H, K, S, Y) and
by III and IV (A, D, M, S). (c) Also, Rule 4 is satisfied by I, II and IV
(K), by I, III and IV (M) and by II, III and IV (S).
With all the four properties (I, II, III and IV) specified, amino acids
characterized by a single property occur only once: property I (L),
property II (W), property III (G) and property IV (F), consistent
with Rule 5.
Supercomputing Facility for Bioinformatics & Computational Biology IITD

20 amino acids and some chemical properties of their side chains
I. Presence of
sp3 hybridized
 carbon (g)

II. Presence of
hydrogen
bond
donor
group (d)

III. Absence of
 carbon
(s)

IV. Absence of
forks
with
hydrogens
(l)

Assignment #

Amino acid

A Alanine

No

No

Yes

Yes

g0d0s2l1

C Cysteine

No

Yes

Yes

No

g0d1s2l0

D Aspartate

No

No

Yes

Yes

g0d0s1l2

E Glutamate

Yes

No

No

Yes

g1d0s0l2

F Phenylalanine

No

No

No

Yes

g0d0s0l3

G Glycine

No

No

Yes

No

g0d0s3l0

H Histidine

No

Yes

No

Yes

g0d2s0l1

I Isoleucine

Yes

No

Yes

No

g2d0s1l0

K Lysine

Yes

Yes

No

Yes

g1d1s0l1

L Leucine

Yes

No

No

No

g3d0s0l0

M Methionine

Yes

No

Yes

Yes

g1d0s1l1

N Asparagine

No

Yes

Yes

No

g0d2s1l0

P Proline

Yes

No

No

Yes

g2d0s0l1

Q Glutamine

Yes

Yes

No

No

g1d2s0l0

R Arginine

Yes

Yes

No

No

g2d1s0l0

S Serine

No

Yes

Yes

Yes

g0d1s1l1

T Threonine

Yes

Yes

Yes

No

g1d1s1l0

V Valine

Yes

No

Yes

No

g1d0s2l0

W Tryptophan

No

Yes

No

No

g0d3s0l0

Y Tyrosine

No

Yes

No

Yes

g0d1s0l2
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Amino acid chemical logic based protein structure modeling

s

l

d

INPUT: Amino Acid
Query Sequence

Scanning Through
Protein Structure

Template Selection and Query Sequence–Template
Structure Alignment Based Upon New Chemical Logic of
Protein Sequences

Library
Template Dependent Model Structure Generation
( FULL LENGTH or FRAGMENT )

FULL LENGTH

FULL LENGTH

OUTPUT: Top 5
model Structures

Fragment Assembly to
Generate Full Length

Loop Optimization, Energy Scoring and Ranking of Full
Length Candidate Structures
Supercomputing Facility for Bioinformatics & Computational Biology IITD

“Deployment” of a Structural Metric for Capturing Native
I. Structure Metric
II. Surface Area Metric
III. Energy Metric

Total Sample
Space
of decoys

Decoys

Decoys
Decoys

5% Selection

50%
Selection

25%
Selection

Top 5
Supercomputing Facility for Bioinformatics & Computational Biology IITD

pcSM ( physico-chemical scoring function)

Avinash Mishra, Satyanarayan Rao, Aditya Mittal and B. Jayaram, “Capturing Native/Native-like Structures with a PhysicoChemical Metric (pcSM) in Protein Folding”, BBA proteins & proteomics, 2013, DOI: 10.1016/j.bbapap.2013.04.023.
Supercomputing Facility for Bioinformatics & Computational Biology IITD

From Sequence to Structure: The Latest Bhageerath-H Pathway
Input amino acid sequence
Bhageerath-H Strgen for protein decoy
generation

DECOY
GENERATION

Amino acids chemical logic based decoy
generation

pcSM scoring for top 5 decoy selection
Quantum (AM1) structure refinement

Top 5 native-like candidate
structures

NATIVE
LIKE
STRUCTURE
SELECTION
STRUCTURE
REFINEMENT
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Results at a Glance
Input sequences : 67 CASP10 targets for which natives are released
1. Homology + de novo (Bhageerath-H at CASP10: 45%)
2. Exhaustive Homology + improved de novo (Post CASP10: 52%)
3. New alignments with chemical logic of AAs + improved de novo (61%)
4 & 5. Classical & 6. QM refinement (66%)
7. Inaccessible conformational space as of June, 2013 (34%)

Bhageerath-H at CASP10 (July, 2012):45%
Best server at CASP10 (July, 2012): 55%

Union of best predictions of top 70 servers
at CASP10 (July, 2012): 60%

Bhageerath-H (June, 2013): 66%

Further Challenges:
Going beyond 66% & improving resolution to ~ 2 to 3 Å
Path to Mt. Everest is worked out! Crossed CAMP III (66%)!!
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Bhageerath-H: A Freely Accessible Web Server
http://www.scfbio-iitd.res.in/bhageerath/bhageerath_h.jsp

The user
inputs the
amino acid
sequence
& five
candidate
structures
for the
native are
emailed
back to
the user
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Target Directed Lead Molecule Design
Sanjeevini

B. Jayaram, Tanya Singh et al., "Sanjeevini: a freely accessible web-server for target directed lead
molecule discovery", BMC Bioinformatics, 2012, 13, S7. http://www.biomedcentral.com/14712105/13/S17/S7
Supercomputing Facility for Bioinformatics & Computational Biology IITD

COST & TIME INVOLVED IN DRUG DISCOVERY
Target Discovery
2.5yrs

4%

Lead Generation
Lead Optimization
3.0yrs

15%

Preclinical Development
1.0yrs

10%

Phase I, II & III Clinical Trials
6.0yrs

68%

FDA Review & Approval
1.5yrs

3%

Drug to the Market
14 yrs

$1.4 billion

Source: PAREXEL‟s Pharmaceutical R&D Statistical Sourcebook, 2001, p96.; Hileman, Chemical Engg. News, 2006,
84, 50-1.
Target Directed Lead Molecule Design

Computer Aided Drug Design

DRUG

NON-DRUG
De novo LEAD-LIKE MOLECULE DESIGN: THE SANJEEVINI PATHWAY
Database

User

Candidate molecules

Drug Target Identification

Drug-like filters

Mutate / Optimize

Geometry Optimization
Quantum Mechanical Derivation of Charges
Assignment of Force Field Parameters

3-Dimensional Structure
of Target

Active Site Identification on Target &
Ligand Docking
Energy Minimization of Complex
Binding free energy estimates - Scoring
Molecular dynamics &
post-facto free energy component analysis (Optional)

Lead-like compound
Jayaram et al., “Sanjeevini”, Indian J. Chemistry-A. 2006, 45A, 1834-1837.
S a n je e v in i P a t h w a y

or

N R D B M /M illio n M o le c u le
L ib r a r y /N a t u r a l P r o d u c t s
a n d T h e ir D e r iv a t iv e s

M o le c u la r D a ta b a s e

T a r g e t P ro te in /D N A

L ig a n d M o le c u le

o r,

U p lo a d

B io a v a lib a lity C h e c k
( L ip in s k i C o m p lia n c e )

B in d in g E n e rg y E s tim a tio n
b y R A S P D p ro to c o l

P r e d ic tio n o f a ll p o s s ib le
a c tiv e s ite s ( fo r p r o te in o n ly
a n d if b in d in g s ite is n o t
k n o w n ).

G e o m e tr y O p tim iz a tio n
T P A C M 4 /Q u a n tu m M e c h a n ic a l
D e r iv a tio n o f C h a r g e s

A s s ig n m e n t o f F o r c e F ie ld P a r a m e te r s

L ig a n d M o le c u le r e a d y f o r D o c k in g

P r o te in /D N A r e a d y f o r D o c k in g

NH2

N

N

+
OH

or

NH

N
N

D o c k & S c o re

M o le c u la r d y n a m ic s & p o s t- fa c to f r e e e n e r g y c o m p o n e n t a n a ly s is

( O p tio n a l)
Supercomputing facility for bioinformatics and computational biology IIT Delhi

Molecular Descriptors / Drug-like Filters
Lipinski’s rule of five
Molecular weight

 500

Number of Hydrogen bond acceptors < 10
Number of Hydrogen bond donors

<5

logP

5

Additional filters
Molar Refractivity

 140

Number of Rotatable bonds

< 10
http://www.scfbio-iitd.res.in/utility/LipinskiFilters.jsp
http://www.scfbio-iitd.res.in/dock/ActiveSite_new.jsp

Tanya Singh, D. Biswas, B. Jayaram, 2011, J. Chem. Inf. Modeling, 51 (10), 2515-2527.
http://www.scfbio-iitd.res.in/software/drugdesign/raspd.jsp

Goutam Mukherjee and B. Jayaram, "A Rapid Identification of Hit Molecules for Target Proteins via PhysicoChemical
Descriptors",
Phys.
Chem.
Chem.
Phys.,
2013,
DOI:10.1039/C3CP44697B.
Supercomputing facility for bioinformatics and computational biology IIT Delhi

Quantum Chemistry on Candidate drugs for
Assignment of Force Field Parameters
-0.1206

0.1251
-0.0653
0.0083
0.1727

AM1

0.1251

6-31G*/RESP

-0.1838
-0.0653
0.0166

-0.1838

TPACM-4

0.1382

-0.5783
0.1727

0.1335

-0.1044
-0.1718

0.0191
0.1191

-0.2085
0.0387

0.1191

-0.0516
-0.3440

0.1404
-0.0162

0.0796

0.0796

-0.0341
0.1191

-0.0099

0.1302
-0.0099

0.0796

-0.7958

-0.7958

G. Mukherjee, N. Patra, P. Barua and B. Jayaram, J. Computational Chemistry, 32, 893-907 (2011).
http://www.scfbio-iitd.res.in/software/drugdesign/charge.jsp
Supercomputing Facility for Bioinformatics & Computational Biology IITD

MONTE CARLO DOCKING OF THE CANDIDATE DRUG IN THE
ACTIVE - SITE OF THE TARGET

www.scfbio-iitd.res.in/dock/pardock.jsp

+
RMSD between the docked &
the crystal structure is 0.2Å
ENERGY MINIMIZATION

5 STRUCTURES WITH LOWEST ENERGY SELECTED
A. Gupta, A. Gandhimathi, P. Sharma, and B. Jayaram, “ParDOCK: An All Atom Energy Based Monte Carlo Docking
Protocol for Protein-Ligand Complexes”, Protein and Peptide Letters, 2007, 14(7), 632-646.
Supercomputing facility for bioinformatics and computational biology IIT Delhi

Binding Affinity Analysis
After obtaining candidate molecules from docking an dscoring, molecular dynamics simulations
followed by free energy analyses (MMPBSA/MMGBSA) are recommended.

+

G0

[Protein]aq

+

[Protein*Inhibitor*]aq

II

I

[Protein*]aq

[Inhibitor*]aq

VI

IV

III

[Protein*]vac

[Inhibitor]aq

V

+

[Inhibitor*]vac

[Protein*Inhibitor*]vac

Parul Kalra, Vasisht Reddy, B. Jayaram, “A Free Energy Component Analysis of HIV-I ProteaseInhibitor Binding”, J. Med.Chem., 2001, 44, 4325-4338.
Supercomputing facility for Bioinformatics and Computational Biology IIT Delhi

Affinity / Specificity Matrix for Drugs and Their Targets/Non-Targets
Shaikh, S., Jain. T., Sandhu, G., Latha, N., Jayaram., B., A physico-chemical pathway from targets to leads, 2007, Current Pharmaceutical
Design, 13, 3454-3470.
Drug1

Drug2

Drug3

Drug4

Drug5

Drug6

Drug7

Drug8

Drug9

Drug10

Drug11

Drug12

Drug13

Drug14

Target1
Target2
Target3

Target4
Target5
Target6
Target7
Target8

Target9
Target10
Target11
Target12
Target13
Target14

BLUE: HIGH BINDING AFFINITY

GREEN: MODERATE AFFINITY

ORANGE: POOR AFFINITY

Diagonal elements represent drug-target binding affinity and off-diagonal elements show drug-non target binding affinity. Drug 1 is specific to Target 1, Drug 2 to Target 2 and so on. Target 1 is
lymphocyte function-associated antigen LFA-1 (CD11A) (1CQP; Immune system adhesion receptor) and Drug 1 is lovastatin.Target 2 is Human Coagulation Factor (1CVW; Hormones &
Factors) and Drug 2 is 5-dimethyl amino 1-naphthalene sulfonic acid (dansyl acid). Target 3 is retinol-binding protein (1FEL; Transport protein) and Drug 3 is n-(4-hydroxyphenyl)all-trans
retinamide (fenretinide). Target 4 is human cardiac troponin C (1LXF; metal binding protein) and Drug 4 is 1-isobutoxy-2-pyrrolidino-3[n-benzylanilino] propane (Bepridil). Target 5 is DNA
{1PRP; d(CGCGAATTCGCG)} and Drug 5 is propamidine. Target 6 is progesterone receptor (1SR7; Nuclear receptor) and Drug 6 is mometasone furoate. Target 7 is platelet receptor for
fibrinogen (Integrin Alpha-11B) (1TY5; Receptor) and Drug 7 is n-(butylsulfonyl)-o-[4-(4-piperidinyl)butyl]-l-tyrosine (Tirofiban). Target 8 is human phosphodiesterase 4B (1XMU; Enzyme)
and Drug 8 is 3-(cyclopropylmethoxy)-n-(3,5-dichloropyridin-4-yl)-4-(difluoromethoxy)benzamide (Roflumilast). Target 9 is Potassium Channel (2BOB; Ion Channel) and Drug 9 is
tetrabutylammonium. Target 10 is {2DBE; d(CGCGAATTCGCG)} and Drug 10 is Diminazene aceturate (Berenil). Target 11 is Cyclooxygenase-2 enzyme (4COX; Enzymes) and Drug 11 is
indomethacin. Target 12 is Estrogen Receptor (3ERT; Nuclear Receptors) and Drug 12 is 4-hydroxytamoxifen. Target 13 is ADP/ATP Translocase-1 (1OKC; Transport protein) and Drug 13 is
carboxyatractyloside. Target 14 is Glutamate Receptor-2 (2CMO; Ion channel) and Drug 14 is 2-({[(3e)-5-{4-[(dimethylamino)(dihydroxy)-lambda~4~-sulfanyl]phenyl}-8-methyl-2-oxo6,7,8,9-tetrahydro-1H-pyrrolo[3,2-H]isoquinolin-3(2H)-ylidene]amino}oxy)-4-hydroxybutanoic acid. The binding affinities are calculated using the software made available at
http://www.scfbio-iitd.res.in/software/drugdesign/bappl.jsp and http://www.scfbio-iitd.res.in/preddicta.
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Future of Drug Discovery: Towards a Molecular View of
ADME/T
D ru g

S ite o f A d m in istra tio n

Parenteral Route

O r a l R o u te

A b so rp tio n

Resorption

D istrib u tio n fro m P la sm a
B ound D rug

U nbound D rug

M e ta b o lism

E x c re tio n

L iv e r

B ile , S a liv a , S w e a t,

S ite o f A c tio n

K id n e y

D rug T arget

The distribution path of an orally administered drug molecule inside the body is depicted. Black solid
arrows: Complete path of drug starting from absorption at site of administration to distribution to the
various compartments in the body, like sites of metabolism, drug action and excretion. Dashed arrows:
Path of the drug after metabolism. Dash-dot arrows: Path of drug after eliciting its required action on
the target. Dot arrows: Path of the drug after being reabsorbed into circulation from the site of
excretion. Affinity/specificity are under control but toxicity is yet to be conquered.
OVERVIEW OF METABOLISM AND TRANSPORT IN P.falciparum
Supercomputing Facility for Bioinformatics & Computational Biology IITD

From Genome to Hits

Genome

X Teraflops
Chemgenome
Bhageerath
Sanjeevini

Hits
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Chikungunya Virus
Chikungunya is one of the most important re-emerging viral
borne disease spreading globally with sporadic intervals. It is
categorized as a BSL3 pathogen and under „C‟ grade by
National Institute of Allergy and Infectious Diseases (NIAID), in
2008. But, yet no approved drug/vaccine is available currently in
the public domain for its treatment/prevention.

Anjali Soni, Khushhali Menaria, Pratima Ray and B. Jayaram. “Genomes to Hits in Silico: A Country
Path Today, A Highway Tomorrow: A case study of chikungunya”, Current Pharmaceutical Design,
2013, 19, 4687-4700 .
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Some available information on CHIKV proteins but no structures
Protein Type Proteins
Functions
NonStructural nsP1
 Methyl transferase domain (acts as cytoplasmic capping enzyme)
Proteins
nsP2
 Viral RNA helicase domain (part of the RNA polyemerase complex)
 Peptidase C9 domain (cleaves four mature proteins from non structural
polyprotein)
nsP3
 Appr. 1-processng domain (minus strand and subgenomic 26S mRNA
synthesis)
nsP4
 Viral RNA dependent RNA polymerase domain (Replicates genomic and
antigenomic RNA and also transcribes 26S subgenomic RNA which
encodes for structural proteins)
Structural
C
 Peptidase_S3 domain (autocatalytic cleavage)
proteins
E3
 Alpha virus E3 spike glycoprotein domain
E2
 Alpha virus E2 glycoprotein domain (viral attachment to host)
6K
 Alpha virus E1 glycoprotein domain (viral glycoprotein processing and
membrane permeabilization)
 Signal peptide domain
E1
 Alpha virus E1 glycoprotein domain (class II viral fusion protein)
 Glycoprotein E dimerization domain (forms E1-E2 heterodimers in
inactive state and E1 trimers in active state)
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Flow diagram illustrating the steps involved in achieving hit molecules
from genomic information
Whole genome sequence of Chikungunya virus is retrieved from NCBI: NC_004162.2
http://www.ncbi.nlm.nih.gov/nuccore/27754751?report=fasta
Genes are predicted which are then translated to the protein sequences using Chemgenome 3.0
http://www.scfbio-iitd.res.in/chemgenome/chemgenome3.jsp

These polyprotein sequences are spliced w.r.t literature and results are processed for 3-D structure prediction
by Bhageerath-H
http://www.scfbio-iitd.res.in/bhageerath/bhageerath_h.jsp

Modeled structures are studied for identification of potential active sites by active site finder AADS
http://www.scfbio-iitd.res.in/dock/ActiveSite.jsp

A million compound library of small molecules is screened against the predicted binding sites using RASPD
http://www.scfbio-iitd.res.in/software/drugdesign/raspd.jsp

The screened molecules are docked, scored and optimized iteratively using SANJEEVINI
http://www.scfbio-iitd.res.in/sanjeevini/sanjeevini.jsp

Hits ready to be synthesized and tested in
laboratory
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Input the CHIKV Genome sequences to ChemGenome 3.0:
Chikungunya virus (strain S27-African prototype),
complete genome
NCBI Ref_Sequence: NC_004162.2

ChemGenome 3.0 output
Two protein coding regions are identified. These proteins
are the polyproteins.
Genes

Start

End

Type

1

77

7501

Nonstructural Polyproteins

2

7567

11313

Structural Polyproteins
Supercomputing Facility for Bioinformatics & Computational Biology IITD

The nonstructural polyproteins are cleaved into 4 protein sequences
w.r.t literature. These sequences serve as input to Bhageerath-H server.

Bhageerath-H output
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Input Protein Structures to an Automated version of Active site
finder (AADS/Sanjeevini)
10 potential binding sites are identified against each model of the proteins
(shown as black dots in the figure)

Scanning against a million compound library
RASPD/Sanjeevini calculations were carried in search of the potential
therapeutics with an average cut-off binding affinity to limit the number of
candidates. (RASPD uses an empirical scoring function which builds in
Lipinski‟s rules and Wiener index).
Supercomputing Facility for Bioinformatics & Computational Biology IITD

RASPD output
Top 100 molecules were screened with the cutoff binding energy to be -8.00
kcal/mol. Out of these 100, one molecule for each model is selected with
good binding energy from one million molecule database corresponding to
the top 5 predicted binding sites. The molecules were choosen for atomic
level binding energy calculations using ParDOCK/Sanjeevini.
These molecules could be tested in the Laboratory.
Supercomputing Facility for Bioinformatics & Computational Biology IITD

In silico suggestions of candidate molecules against CHIKV
Supercomputing Facility for Bioinformatics & Computational Biology IITD

In progress
Sub-micromolar compounds identified using Sanjeevini
(after design, synthesis & testing) against
Malaria (Jamia Millia & ICGEB), Breast Cancer (Amity),
Alzheimers (KSBS, IITD) & Diabetes (IITD)

These are under further development
Supercomputing Facility for Bioinformatics & Computational Biology IITD

SCFBio Team
16 processor Linux Cluster

Storage Area Network

~ 10 teraflops of computing; 50 terabytes of storage
Supercomputing Facility for Bioinformatics & Computational Biology IITD

BioComputing Group, IIT Delhi (PI : Prof. B. Jayaram)

Shashank Shekhar
Priyanka Dhingra
Ashutosh Shandilya
Varsha Singh
Prashant Rana
Manpreet Singh
Preeti Bisht

Present
Goutam Mukherjee
Vandana Shekhar
Abhilash Jayaraj
Ankita Singh
Rahul Kaushik
Samar Chakraborty
Sanjeev Kumar

Dr. Achintya Das
Dr. Tarun Jain
Dr. Kumkum Bhushan
Dr. Nidhi Arora
Pankaj Sharma
A.Gandhimathi
Neelam Singh
Dr. Sandhya Shenoy
Sahil Kapoor
Navneet Tomar

Former
Dr. N. Latha
Dr. Saher Shaikh
Dr. Poonam Singhal
Dr. Garima Khandelwal
Praveen Agrawal
Gurvisha Sandhu
Shailesh Tripathi
Rebecca Lee
Satyanarayan Rao
Surojit Bose

Tanya Singh
Avinash Mishra
Anjali Soni
Debarati Dasgupta
Ali Khosravi
R. Nagarajan

Dr. Pooja Narang
Dr. Parul Kalra
Dr. Surjit Dixit
Dr. E. Rajasekaran
Vidhu Pandey
Anuj Gupta
Dhrubajyoti Biswas
Bharat Lakhani
Pooja Khurana
Kritika Karri
Under Incubation at IITD (since April, 2011)
Recipient of TATA NEN 2012 Award
Recipient of Biospectrum 2013 Award
Recipient of BioAsia 2014 Award
Novel Technologies

Computational
Network
Genomics

Target
Discovery
Proteomics

Compound
Screening

NI research pipeline
Sahil Kapoor
Avinash Mishra
Shashank Shekhar

Hit Molecules
Supercomputing Facility for Bioinformatics & Computational Biology IITD

Acknowledgements
Department of Biotechnology
Department of Science & Technology
Ministry of Information Technology
Council of Scientific & Industrial Research

Indo-French Centre for the Promotion of Advanced Research (CEFIPRA)
HCL Life Science Technologies
Dabur Research Foundation

Indian Institute of Technology, Delhi
Visit Us at

www.scfbio-iitd.res.in
Thank You

Mais conteúdo relacionado

Mais procurados

Assay Virtual Screening Compounds for the Inhibitory Potencies against BACE 1
Assay Virtual Screening Compounds for the Inhibitory Potencies against BACE 1Assay Virtual Screening Compounds for the Inhibitory Potencies against BACE 1
Assay Virtual Screening Compounds for the Inhibitory Potencies against BACE 1
Ben Leong
 
Genome editing as a tool for enhancing disease resistance in crops - Vladimir...
Genome editing as a tool for enhancing disease resistance in crops - Vladimir...Genome editing as a tool for enhancing disease resistance in crops - Vladimir...
Genome editing as a tool for enhancing disease resistance in crops - Vladimir...
OECD Environment
 
Why Y Chromosome Markers are an Ever Expanding Essential Tool in Sexual Assau...
Why Y Chromosome Markers are an Ever Expanding Essential Tool in Sexual Assau...Why Y Chromosome Markers are an Ever Expanding Essential Tool in Sexual Assau...
Why Y Chromosome Markers are an Ever Expanding Essential Tool in Sexual Assau...
Thermo Fisher Scientific
 
2015 - Cdk5 promotes DNA replication stress checkpoint activation through RPA...
2015 - Cdk5 promotes DNA replication stress checkpoint activation through RPA...2015 - Cdk5 promotes DNA replication stress checkpoint activation through RPA...
2015 - Cdk5 promotes DNA replication stress checkpoint activation through RPA...
Simon Gemble
 
Aug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencingAug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencing
GenomeInABottle
 

Mais procurados (20)

PNAS 1992 FBH.PDF
PNAS 1992 FBH.PDFPNAS 1992 FBH.PDF
PNAS 1992 FBH.PDF
 
HIV RT Inhibitions in vitro and or or in vivo: materials and methods
HIV RT Inhibitions in vitro and or or in vivo: materials and methods HIV RT Inhibitions in vitro and or or in vivo: materials and methods
HIV RT Inhibitions in vitro and or or in vivo: materials and methods
 
Homspera march2010
Homspera march2010Homspera march2010
Homspera march2010
 
vineeta poster 2
vineeta  poster  2vineeta  poster  2
vineeta poster 2
 
Defining the relevant genome in solid tumors
Defining the relevant genome in solid tumorsDefining the relevant genome in solid tumors
Defining the relevant genome in solid tumors
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
 
PREVALENCE OF CANCER ASSOCIATED GENES IN BREAST CANCER PATIENTS IN THE HOSPIT...
PREVALENCE OF CANCER ASSOCIATED GENES IN BREAST CANCER PATIENTS IN THE HOSPIT...PREVALENCE OF CANCER ASSOCIATED GENES IN BREAST CANCER PATIENTS IN THE HOSPIT...
PREVALENCE OF CANCER ASSOCIATED GENES IN BREAST CANCER PATIENTS IN THE HOSPIT...
 
Assay Virtual Screening Compounds for the Inhibitory Potencies against BACE 1
Assay Virtual Screening Compounds for the Inhibitory Potencies against BACE 1Assay Virtual Screening Compounds for the Inhibitory Potencies against BACE 1
Assay Virtual Screening Compounds for the Inhibitory Potencies against BACE 1
 
Precision Breeding in Plants
Precision Breeding in PlantsPrecision Breeding in Plants
Precision Breeding in Plants
 
Genome editing as a tool for enhancing disease resistance in crops - Vladimir...
Genome editing as a tool for enhancing disease resistance in crops - Vladimir...Genome editing as a tool for enhancing disease resistance in crops - Vladimir...
Genome editing as a tool for enhancing disease resistance in crops - Vladimir...
 
Transitioning to gr_ch38
Transitioning to gr_ch38Transitioning to gr_ch38
Transitioning to gr_ch38
 
Why Y Chromosome Markers are an Ever Expanding Essential Tool in Sexual Assau...
Why Y Chromosome Markers are an Ever Expanding Essential Tool in Sexual Assau...Why Y Chromosome Markers are an Ever Expanding Essential Tool in Sexual Assau...
Why Y Chromosome Markers are an Ever Expanding Essential Tool in Sexual Assau...
 
Oliver_2014
Oliver_2014Oliver_2014
Oliver_2014
 
2015 - Cdk5 promotes DNA replication stress checkpoint activation through RPA...
2015 - Cdk5 promotes DNA replication stress checkpoint activation through RPA...2015 - Cdk5 promotes DNA replication stress checkpoint activation through RPA...
2015 - Cdk5 promotes DNA replication stress checkpoint activation through RPA...
 
Church dm grc_workshop
Church dm grc_workshopChurch dm grc_workshop
Church dm grc_workshop
 
Aug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencingAug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencing
 
Development of SSR markers in mungbean
Development of SSR markers in mungbeanDevelopment of SSR markers in mungbean
Development of SSR markers in mungbean
 
Assessment of TP53 Mutation Status in Breast Tumor Tissue using the "Ion Ampl...
Assessment of TP53 Mutation Status in Breast Tumor Tissue using the "Ion Ampl...Assessment of TP53 Mutation Status in Breast Tumor Tissue using the "Ion Ampl...
Assessment of TP53 Mutation Status in Breast Tumor Tissue using the "Ion Ampl...
 
Homology modeling, docking and comparative study of the selectivity of some h...
Homology modeling, docking and comparative study of the selectivity of some h...Homology modeling, docking and comparative study of the selectivity of some h...
Homology modeling, docking and comparative study of the selectivity of some h...
 
cancer biology(ppt)
 cancer biology(ppt) cancer biology(ppt)
cancer biology(ppt)
 

Semelhante a Eradicating diseases (genome)

Probes 2010
Probes 2010Probes 2010
Probes 2010
toluene
 
Protein-protein interaction
Protein-protein interactionProtein-protein interaction
Protein-protein interaction
sigma-tau
 
Avacta Life Sciences Affimers Presentation Global Protein Engineering Summit ...
Avacta Life Sciences Affimers Presentation Global Protein Engineering Summit ...Avacta Life Sciences Affimers Presentation Global Protein Engineering Summit ...
Avacta Life Sciences Affimers Presentation Global Protein Engineering Summit ...
AvactaLifeSciences
 
QMB_Poster_Tom_Kelly
QMB_Poster_Tom_KellyQMB_Poster_Tom_Kelly
QMB_Poster_Tom_Kelly
Tom Kelly
 
Thesis defense - QUANG ONG - FINAL
Thesis defense - QUANG ONG - FINALThesis defense - QUANG ONG - FINAL
Thesis defense - QUANG ONG - FINAL
Quang Ong
 

Semelhante a Eradicating diseases (genome) (20)

Docking studies
Docking studiesDocking studies
Docking studies
 
Docking studies
Docking studiesDocking studies
Docking studies
 
Probes 2010
Probes 2010Probes 2010
Probes 2010
 
Analysis of Parasporin (Anti-cancer) proteins of Bacillus thuringiensis.pdf
Analysis of Parasporin (Anti-cancer) proteins of Bacillus thuringiensis.pdfAnalysis of Parasporin (Anti-cancer) proteins of Bacillus thuringiensis.pdf
Analysis of Parasporin (Anti-cancer) proteins of Bacillus thuringiensis.pdf
 
Antibody-Producing Cell Lines Development
Antibody-Producing Cell Lines DevelopmentAntibody-Producing Cell Lines Development
Antibody-Producing Cell Lines Development
 
JoB spike in manuscript 2014
JoB spike in manuscript 2014JoB spike in manuscript 2014
JoB spike in manuscript 2014
 
Protein-protein interaction
Protein-protein interactionProtein-protein interaction
Protein-protein interaction
 
Avacta Life Sciences Affimers Presentation Global Protein Engineering Summit ...
Avacta Life Sciences Affimers Presentation Global Protein Engineering Summit ...Avacta Life Sciences Affimers Presentation Global Protein Engineering Summit ...
Avacta Life Sciences Affimers Presentation Global Protein Engineering Summit ...
 
QMB_Poster_Tom_Kelly
QMB_Poster_Tom_KellyQMB_Poster_Tom_Kelly
QMB_Poster_Tom_Kelly
 
Pre-clinical drug prioritization via prognosis-guided genetic interaction net...
Pre-clinical drug prioritization via prognosis-guided genetic interaction net...Pre-clinical drug prioritization via prognosis-guided genetic interaction net...
Pre-clinical drug prioritization via prognosis-guided genetic interaction net...
 
Host cell protein dynamics in recombinant CHO cells-Impact from harvest to Pu...
Host cell protein dynamics in recombinant CHO cells-Impact from harvest to Pu...Host cell protein dynamics in recombinant CHO cells-Impact from harvest to Pu...
Host cell protein dynamics in recombinant CHO cells-Impact from harvest to Pu...
 
Unc slides on computational toxicology
Unc slides on computational toxicologyUnc slides on computational toxicology
Unc slides on computational toxicology
 
20140613 Analysis of High Throughput DNA Methylation Profiling
20140613 Analysis of High Throughput DNA Methylation Profiling20140613 Analysis of High Throughput DNA Methylation Profiling
20140613 Analysis of High Throughput DNA Methylation Profiling
 
Thesis defense - QUANG ONG - FINAL
Thesis defense - QUANG ONG - FINALThesis defense - QUANG ONG - FINAL
Thesis defense - QUANG ONG - FINAL
 
2015 07 09__epigenetic_profiling_environmental_health_sciences_v42
2015 07 09__epigenetic_profiling_environmental_health_sciences_v422015 07 09__epigenetic_profiling_environmental_health_sciences_v42
2015 07 09__epigenetic_profiling_environmental_health_sciences_v42
 
Drug Discovery Today: Fighting TB with Technology
Drug Discovery Today: Fighting TB with TechnologyDrug Discovery Today: Fighting TB with Technology
Drug Discovery Today: Fighting TB with Technology
 
Grafström - Lush Prize Conference 2014
Grafström - Lush Prize Conference 2014Grafström - Lush Prize Conference 2014
Grafström - Lush Prize Conference 2014
 
Lombardi_TUMA 14 - Pesaro
Lombardi_TUMA 14 - PesaroLombardi_TUMA 14 - Pesaro
Lombardi_TUMA 14 - Pesaro
 
2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge
 
Bigger Data to Increase Drug Discovery
Bigger Data to Increase Drug DiscoveryBigger Data to Increase Drug Discovery
Bigger Data to Increase Drug Discovery
 

Mais de Utkarsh Verma

Mais de Utkarsh Verma (20)

Magnetic resonance imaging (mri)
Magnetic resonance imaging (mri)Magnetic resonance imaging (mri)
Magnetic resonance imaging (mri)
 
Inspire scholarship
Inspire scholarshipInspire scholarship
Inspire scholarship
 
Helicobacter pylori (cancer)
Helicobacter pylori (cancer)Helicobacter pylori (cancer)
Helicobacter pylori (cancer)
 
Genetic susceptibility
Genetic susceptibilityGenetic susceptibility
Genetic susceptibility
 
Biotechnological applications for environmental waste management
Biotechnological applications for environmental waste managementBiotechnological applications for environmental waste management
Biotechnological applications for environmental waste management
 
Medical detective(forensic)
Medical detective(forensic)Medical detective(forensic)
Medical detective(forensic)
 
Tourism and eco tourism in india
Tourism and eco tourism in indiaTourism and eco tourism in india
Tourism and eco tourism in india
 
ITER
ITERITER
ITER
 
Nationalism in india
Nationalism in india Nationalism in india
Nationalism in india
 
Land degradation
Land degradation Land degradation
Land degradation
 
Ajanta caves
Ajanta caves Ajanta caves
Ajanta caves
 
087 x sa2_02_b1 qp social science
087 x sa2_02_b1 qp social science087 x sa2_02_b1 qp social science
087 x sa2_02_b1 qp social science
 
087 x sa2_44_a1 qp social science
087 x sa2_44_a1 qp social science087 x sa2_44_a1 qp social science
087 x sa2_44_a1 qp social science
 
Life span
Life spanLife span
Life span
 
Stephen william hawking
Stephen william hawkingStephen william hawking
Stephen william hawking
 
Sabarmati ashram
Sabarmati ashram Sabarmati ashram
Sabarmati ashram
 
Robert baden
Robert badenRobert baden
Robert baden
 
Olympic games
Olympic gamesOlympic games
Olympic games
 
Nelson mandela
Nelson mandelaNelson mandela
Nelson mandela
 
Monumental heritage in delhi
Monumental heritage in delhiMonumental heritage in delhi
Monumental heritage in delhi
 

Último

Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan 087776558899
 
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Sheetaleventcompany
 
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
Sheetaleventcompany
 
Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...
Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...
Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...
Sheetaleventcompany
 
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
Sheetaleventcompany
 
Difference Between Skeletal Smooth and Cardiac Muscles
Difference Between Skeletal Smooth and Cardiac MusclesDifference Between Skeletal Smooth and Cardiac Muscles
Difference Between Skeletal Smooth and Cardiac Muscles
MedicoseAcademics
 

Último (20)

Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...
Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...
Call girls Service Phullen / 9332606886 Genuine Call girls with real Photos a...
 
Call Girls Shahdol Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Shahdol Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Shahdol Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Shahdol Just Call 8250077686 Top Class Call Girl Service Available
 
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
 
Call Girls Kathua Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kathua Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Kathua Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kathua Just Call 8250077686 Top Class Call Girl Service Available
 
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptxANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
 
Chandigarh Call Girls Service ❤️🍑 9809698092 👄🫦Independent Escort Service Cha...
Chandigarh Call Girls Service ❤️🍑 9809698092 👄🫦Independent Escort Service Cha...Chandigarh Call Girls Service ❤️🍑 9809698092 👄🫦Independent Escort Service Cha...
Chandigarh Call Girls Service ❤️🍑 9809698092 👄🫦Independent Escort Service Cha...
 
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
 
Kolkata Call Girls Naktala 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Kolkata Call Girls Naktala  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Girl Se...Kolkata Call Girls Naktala  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Girl Se...
Kolkata Call Girls Naktala 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
 
❤️Chandigarh Escorts Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ ...
❤️Chandigarh Escorts Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ ...❤️Chandigarh Escorts Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ ...
❤️Chandigarh Escorts Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ ...
 
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service AvailableCall Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
 
💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...
💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...
💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...
 
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
💚Chandigarh Call Girls Service 💯Piya 📲🔝8868886958🔝Call Girls In Chandigarh No...
 
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
 
Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...
Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...
Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...
 
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
 
Difference Between Skeletal Smooth and Cardiac Muscles
Difference Between Skeletal Smooth and Cardiac MusclesDifference Between Skeletal Smooth and Cardiac Muscles
Difference Between Skeletal Smooth and Cardiac Muscles
 
💚Reliable Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girl In Chandigarh N...
💚Reliable Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girl In Chandigarh N...💚Reliable Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girl In Chandigarh N...
💚Reliable Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girl In Chandigarh N...
 
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
 
Cheap Rate Call Girls Bangalore {9179660964} ❤️VVIP BEBO Call Girls in Bangal...
Cheap Rate Call Girls Bangalore {9179660964} ❤️VVIP BEBO Call Girls in Bangal...Cheap Rate Call Girls Bangalore {9179660964} ❤️VVIP BEBO Call Girls in Bangal...
Cheap Rate Call Girls Bangalore {9179660964} ❤️VVIP BEBO Call Girls in Bangal...
 
Gastric Cancer: Сlinical Implementation of Artificial Intelligence, Synergeti...
Gastric Cancer: Сlinical Implementation of Artificial Intelligence, Synergeti...Gastric Cancer: Сlinical Implementation of Artificial Intelligence, Synergeti...
Gastric Cancer: Сlinical Implementation of Artificial Intelligence, Synergeti...
 

Eradicating diseases (genome)

  • 1. SCIENCE PROJECT How can Science help us Eradicate Disease? Genomes to Hit Molecules in Silico: A Country Path Today, A Highway Tomorrow By utkARSh veRMA KVJNU
  • 2. Supercomputing Facility for Bioinformatics & Computational Biology IITD COST & TIME INVOLVED IN DRUG DISCOVERY Target Discovery 2.5yrs 4% Lead Generation Lead Optimization 3.0yrs 15% Preclinical Development 1.0yrs 10% Phase I, II & III Clinical Trials 6.0yrs 68% FDA Review & Approval 1.5yrs 3% Drug to the Market 14 yrs $1.4 billion Source: PAREXEL‟s Pharmaceutical R&D Statistical Sourcebook, 2001, p96.; Hileman, Chemical Engg. News, 2006, 84, 50-1.
  • 3. Supercomputing Facility for Bioinformatics & Computational Biology IITD CELL TISSUE ORGAN ORGANISM
  • 4.
  • 5. Supercomputing Facility for Bioinformatics & Computational Biology IITD Conjugate rule acts as a good constraint on the „z‟ coordinate of chemgenome or one can simply use +1/-1 as in the adjacent table for „z‟ TTT Phe -1 GGT Gly +1 TAT Tyr -1 GCT Ala +1 TTC Phe -1 GGC Gly +1 TAC Tyr -1 GCC Ala +1 TTA Leu -1 GGA Gly +1 TAA Stop -1 GCA Ala +1 TTG Leu -1 GGG Gly +1 TAG Stop -1 GCG Ala +1 ATT Ile -1 CGT Arg +1 CAT His +1 ACT Thr -1 ATC Ile +1 CGC Arg -1 CAC His -1 ACC Thr +1 ATA Ile +1 CGA Arg -1 CAA Gln -1 ACA Thr +1 ATG Met -1 CGG Arg +1 CAG Gln +1 ACG Thr -1 TGT Cys -1 GTT Val +1 AAT Asn -1 CCT Pro +1 TGC Cys -1 GTC Val +1 AAC Asn +1 CCC Pro -1 TGA Stop -1 GTA Val +1 AAA Lys +1 CCA Pro -1 TGG Trp -1 GTG Val +1 AAG Lys -1 CCG Pro +1 AGT Ser -1 CTT Leu +1 GAT Asp +1 TCT Ser -1 AGC Ser +1 CTC Leu -1 GAC Asp +1 TCC Ser -1 AGA Arg +1 CTA Leu -1 GAA Glu +1 TCA Ser -1 AGG Arg -1 CTG Leu +1 GAG Glu +1 TCG Ser -1 Extent of Degeneracy in Genetic Code is captured by Rule of Conjugates: A1,2 is the conjugate of C1,2 & U1,2 is the conjugate of G1,2:(A2 x C2 & G2 x U2) With 6 h-bonds at positions 1 and 2 between codon and anticodon, third base is inconsequential With 4 h-bonds at positions 1 and 2 third base is essential With 5 h-bonds middle pyrimidine renders third base inconsequential; middle purine requires third base. B. Jayaram, "Beyond Wobble: The Rule of Conjugates", J. Molecular Evolution, 1997, 45, 704-705.
  • 6. Supercomputing Facility for Bioinformatics & Computational Biology IITD + c  ω Y
  • 7. Supercomputing Facility for Bioinformatics & Computational Biology IITD Comparative Genomics/Proteomics for Drug Target Identification c Drug Target = H  I H = Human Genome / Proteome (Healthy Individual) I = Genome / Proteome of the Invader / Pathogen *Present: 1600 drugs (Drugbank) targeted to ~1000 (500 human + 500 pathogen) targets *Andersen et al., Nature Reviews Drug Discovery, 10, 579 (2011)
  • 8. Supercomputing Facility for Bioinformatics & Computational Biology IITD Target Directed Lead Molecule Design Active Site
  • 9. From Genome to Drug : Sketching a Computational Pathway for Modern Drug Discovery ChemGenome Drug DNA Sanjeevini mRNA SEEARTINSCIENCE…… Polypeptide Sequence Genome Bhageerath Tertiary Structure Protein
  • 10. Supercomputing Facility for Bioinformatics & Computational Biology IITD Hepatitis B virus (HBV) is a major blood-borne pathogen worldwide. Despite the availability of an efficacious vaccine, chronic HBV infection remains a major challenge with over 350 million carriers. No. HBV ORF Protein Function 1 ORF P Viral polymerase DNA polymerase, Reverse transcriptase and RNase H activity[36,48]. 2 ORF S HBV surface proteins Envelope proteins: three in-frame start (HBsAg, pre-S1 and codons code for the small, middle and pre-S2) the large surface proteins[36,49,50]. The pre-S proteins are associated with virus attachment to the hepatocyte[51] 3 ORF C Core protein HBeAg 4 ORF X HBx protein and HBcAg: forms the capsid [36]. HBeAg: soluble protein and its biological function are still not understood. However, strong epidemiological associations with HBV replication[52] and risk for hepatocellular carcinoma are known[42]. Transactivator; required to establish infection in vivo[53,54]. Associated with multiple steps leading to hepatocarcinogenesis[45].
  • 11. Supercomputing Facility for Bioinformatics & Computational Biology IITD United States FDA approved agents for anti-HBV therapy Agent Interferon alpha Peginterferon alpha2a Lamivudine Adefovir dipivoxil Tenofovir Entecavir Telbivudine Mechanism of action / class of drugs Immune-mediated clearance Immune-mediated clearance Nucleoside analogue Nucleoside analogue Nucleoside analogue Nucleoside analogue Nucleoside analogue Resistance to nucleoside analogues have been reported in over 65% of patients on long-term treatment. It would be particularly interesting to target proteins other than the viral polymerase.
  • 12. Supercomputing Facility for Bioinformatics & Computational Biology IITD Input the HBV Genome sequence to ChemGenome Hepatitis B virus, complete genome NCBI Reference Sequence: NC_003977.1 >gi|21326584|ref|NC_003977.1| Hepatitis B virus, complete genome ChemGenome 3.0 output Five protein coding regions identified Gene 2 (BP: 1814 to 2452) predicted by the ChemGenome 3.0 software encodes for the HBV precore/ core protein (Gene Id: 944568)
  • 13. Supercomputing Facility for Bioinformatics & Computational Biology IITD >gi|77680741|ref|YP_355335.1| precore/core protein [Hepatitis B virus] MQLFPLCLIISCSCPTVQASKLCLGWLWGMDIDPYKE FGASVELLSFLPSDFFPSIRDLLDTASALYREALESPEH CSPHHTALRQAILCWGELMNLATWVGSNLEDPASREL VVSYVNVNMGLKIRQLLWFHISCLTFGRETVLEYLVS FGVWIRTPPAYRPPNAPILSTLPETTVVRRRGRSPRRR TPSPRRRRSQSPRRRRSQSRESQC Input Amino acid sequence to Bhageerath-H
  • 14. Supercomputing Facility for Bioinformatics & Computational Biology IITD Input Protein Structure to Active site identifier (ASF/Sanjeevini) 10 potential binding sites identified Scan a million compound library RASPD/Sanjeevini calculation with an average cut off binding affinity to limit the number of candidates. (Empirical scoring function which builds in Lipisnki‟s rules and Wiener index) RASPD output 2057 molecules were selected with good binding energy from one million molecule database corresponding to the top 5 predicted binding sites.
  • 15. Supercomputing Facility for Bioinformatics & Computational Biology IITD Out of the 2057 molecules, top 40 molecules are given as input to ParDOCK/Sanjeevini for atomic level binding energy calculations. Out of this 40, (with a cut off of -7.5 kcal/mol), 24 molecules are seen to bind well to precore/core protein target. These molecules could be tested in the Laboratory. Mol. ID 0001398 0004693 0007684 0007795 0008386 0520933 0587461 0027252 0036686 0051126 0104311 0258280 0000645 0001322 0001895 0002386 0003092 0001084 0002131 0540853 1043386 0088278 0043629 0097895 Binding Energy (kcal/mol) -10.14 -8.78 -10.05 -9.06 -8.38 -8.21 -10.22 -8.39 -8.33 -8.73 -9.3 -7.8 -7.89 -8.23 -9.49 -8.53 -8.35 -8.68 -8.07 -11.08 -10.14 -9.16 -7.5 -8.04
  • 16. Supercomputing Facility for Bioinformatics & Computational Biology IITD 24 hit molecules for precore/core protein target of HBV
  • 17. Supercomputing Facility for Bioinformatics & Computational Biology IITD From Genome to Hits Genome X Teraflops Chemgenome Bhageerath Sanjeevini Hits Anjali Soni, K. M. Pandey, P. Ray, B. Jayaram, “Genomes to Hits in Silico: A Country Path Today, A Highway Tomorrow: A case study of chikungunya”, Current Pharmaceutical Design, 2013, 19, 46874700, DOI: 10.2174/13816128113199990379.
  • 18. ChemGenome A Physico-Chemical Model for identifying signatures of functional units on Genomes Protein-nucleic acid interaction propensity parameter Stacking Energy Gene Non Gene Protein-Nucleic acid interaction parameter Hydrogen Bonding Stacking Energy
  • 19. Physico-chemical fingerprints of RNA Genes Varsha Singh, Ankita Singh, Kritika Karri, Garima Khandelwal, B. Jayaram, to be submitted, 2014.
  • 20. Supercomputing Facility for Bioinformatics & Computational Biology IITD Distinguishing Genes (blue) from Non-Genes (red) in ~ 900 Prokaryotic Genomes A B C D E F Three dimensional plots of the distributions of gene and non-gene direction vectors for six best cases (A to F) calculated from the genomes of (A) Agrobacterium tumefaciens (NC_003304), (B) Wolinella succinogenes (NC_005090), (C) Rhodopseudomonas palustris (NC_005296), (D) Bordetella bronchiseptica (NC_002927), (E) Clostridium acetobutylicium (NC_003030), (F) Bordetella pertusis (NC_002929) Poonam Singhal, B. Jayaram, Surjit B. Dixit & David L. Beveridge, Molecular Dynamics Based Physicochemical Model for Gene Prediction in Prokaryotic Genomes, Biophys. J., 2008, 94, 4173-4183.
  • 22. FTP Directory: ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/CHR_X/ DEFINITION Homo sapiens chromosome X (196 MB) ORIGIN 1 ccaggatggt ccttctcctg aaggttaatc cataggcaga tgaatcggat attgattcct 61 gttcttggaa taatctagag gatctttaga atccattggg attcataatc acagctatgc 121 cgatgccatc atcaccggct tagccctttc tgaaaacaca gtcatcatct acccccattg 181 gaatcacgat gcaaaaaacc tgtcccaaag cggtggtttc ctatgtgatt cttgcatcca 241 ggacaaatga cagtcagcag agaggcgccc tgttccatct tttggtttga tccagttaaa 301 ggcacacacg tgagcaccca acgtttgcca actcagcact gggcagagcc tggcctctga 361 ggaaattggc atcttcgtaa tcaatatatt attatgtttt attgaaatgt aagtcattgc….. DEFINITION Homo sapiens chromosome Y (25 MB) ORIGIN 1 ggtttcacca agttggccag gctggtctcg aactcctgac ctcaggtgat ctgtccacct 61 cggtgtccca aagtgctggg attacaggtg tgaaccacca cacccagcct catgtaatac 121 ttaaaaatga actacaggtg gattacaaac ctgaatatca aagaaaactt ttttttttga 181 aaaatagagg gaaatgtctt ataacctcag agttaggagg tttttcttag atacaataca 241 aaaagcataa ccacgcccat agtcccagct actcaggagg ctgaggcata agaatcactt 301 gagctcgaga ggtggaggtt gcagtgagcc gagatcctgc cattgcactc cagctgaggc 361 tacagagtga gagtataaaa aaaaaaaaaa aagcataacc tttaaaaatg ggttagccta
  • 23. Supercomputing Facility for Bioinformatics & Computational Biology IITD Let us read the book of Human Genome soon like a Harry Potter novel ! Human Genome 3000 Mb Gene & Gene related Sequences Extra-genic DNA 2100 Mb 900 Mb Coding DNA 90 Mb (3%) !!! Non-coding DNA Repetitive DNA Unique & low copy number 420 Mb 1680 Mb 810 Mb Tandemly repeated DNA Satellite, micro-satellite, mini-satellite DNA Interspersed genome wide repeats LTR elements, Lines, Sines, DNA Transposons
  • 24. The Grand Challenge of Protein Tertiary Structure Prediction The Bhageerath Pathway -------GLU ALA GLU MET LYS ALA SER VAL THR VAL LEU THR ALA LEU GLY ALA HIS GLU ALA GLU LEU LYS PRO LEU ALA LYS ILE PRO ILE LYS TYR LEU GLU PHE LEU HIS---- GLU ILE GLN ILE ASP LEU SER SER LEU LYS HIS GLU LYS LYS ALA ALA LYS LYS THR ILE HIS GLY LYS ILE GLY HIS HIS HIS
  • 25. Supercomputing Facility for Bioinformatics & Computational Biology IITD The 20 amino acids differ from each other in the side chain (shown in pink)
  • 26. Supercomputing Facility for Bioinformatics & Computational Biology IITD Proteins are polymers of amino acids with unlimited potential variety of structures and metabolic activities. 1. 2. 3. 4. 5. 6. 7. 8. 9. Structural proteins Protective proteins Contractile proteins Transport proteins Storage proteins Toxins Hormones Enzymes .......
  • 27. Supercomputing Facility for Bioinformatics & Computational Biology IITD + c  ω Y http://www.rcsb.org/pdb/home Biological Macromolecular Resource
  • 28. Supercomputing Facility for Bioinformatics & Computational Biology IITDelhi Protein Folding Problem Grand Challenge NP Complete (hard) problem. Protein Structure Prediction is an open challenge in life sciences / physical sciences and Computer Science. “Native structure” at the bottom of the free energy Funnel. Thermodynamic hypothesis of Anfinsen
  • 29. Supercomputing Facility for Bioinformatics & Computational Biology IITD WHY FOLD PROTEINS ? Proteins Hormones & factors DNA & nuclear receptors Ion channels Unknown “Proteins”- Majority of Drug Targets • • Structure-based drug-design Mapping the functions of proteins in metabolic pathways. Experimental Approach Computational approaches X-ray crystallography NMR spectroscopy Comparative methods Ab initio methods Cost > $100K ~ $1M Thousands of dollars ?? Time at least 6 months at least 6 months Days Accuracy very high very high Days Depend on the similarity of homologous template. Moderate Prone to failure as crystallizing a Prone to failure, Require a homologous Sampling and scoring protein is still like an art and many and is only template with at least 30% limitations Limitation proteins (e.g. membrane) cannot be applicable to similarity. The accuracy is crystallized. small proteins significantly reduced when (<150 amino the similarity is low. acids).
  • 30. Supercomputing Facility for Bioinformatics & Computational Biology IITD Computational Requirements for ab initio Protein Folding Strategy A • Generate all possible conformations and find the most stable one. Strategy B • Start with a straight chain and solve F = ma to capture the most stable state • For a protein comprising 200 AA • A 200 AA protein evolves assuming 2 degrees of freedom per AA ~ 10-10 sec / day / processor • 2200 • 10-2 sec => 108 days Structures => 2200 Minutes to optimize and find free energy. 2200 Minutes = 3 x 1054 Years! ~ 106 years With a million processors ~ 1 year Anton machine is making „Strategy B‟ viable for small proteins: David E. Shaw, Paul Maragakis, Kresten Lindorff-Larsen, Stefano Piana, Ron O. Dror, Michael P. Eastwood, Joseph A. Bank, John M. Jumper, John K. Salmon, Yibing Shan, and Willy Wriggers, "Atomic-Level Characterization of the Structural Dynamics of Proteins," Science, vol. 330, no. 6002, 2010, pp. 341–346.
  • 31. Supercomputing Facility for Bioinformatics & Computational Biology IITD Bhageerath Strategy: A Case Study of Mouse C-Myb DNA Binding (52 AA) LIKGPWTKEEDQRVIELVQKYGPKRWSVIAKHLKGRIGKQCRERWHNHLNPE Sequence Preformed Secondary Structure 16384 Trial Structures Biophysical Filters & Clash Removal 10632 Structures Energy Scans RMSD=2.87, Energy Rank=1774 RMSD=4.0 Ang, Energy Rank=4 Blue: Native; Red: Predicted
  • 32. Supercomputing Facility for Bioinformatics & Computational Biology IITD Development of a homology / ab initio hybrid server for proteins of all sizes Bhageerath-H Protocol Overall strategy: (1) Generate several plausible candidate structures by a mix of methods & (2) Score them to realize near-native structures B. Jayaram, Priyanka Dhingra, Baharat Lakhani, Shashank Shekhar, “Bhageerath: Attempting the Near Impossible – Pushing the Frontiers of Atomic Models for Protein Tertiary Structure Prediction”, J Chemical
  • 33. Supercomputing Facility for Bioinformatics & Computational Biology IITD All the (eight) modules of the protocol are currently implemented on a dedicated 280 AMD Opteron 2.4 GHz processor cluster (~3 teraflops). Jayaram et al., Bhageerath, Nucl. Acid Res., 2006, 34, 6195-6204
  • 34. Supercomputing Facility for Bioinformatics & Computational Biology IITD R,B,P,G Reexamining the language of amino acids (1) RRR (2) RRB (3) RRP (4) RRG (5) BBR (6) BBB (7) BBP (8) BBG R,B,P,G R,B,P,G 64 coloured triangles are possible. By virtue of the symmetries of the triangle, only 20 of these are unique. (9) PPR (13) GGR (10) PPB (14) GGB (11) PPP (15) GGP (12) PPG (16) GGG Some observations I. Any color occurs in exactly 10 triangles R (1,2,3,4,5,9,13,17,18,19); B (2,5,6,7,8,10,14,17,18,20); P (3,7,9,10,11,12,15,18,19,20); G (4,8,12,13,14,15,16,17,19,20) II. Any two distinct colors occur together in 4 triangles R & B (2,5,17,18); R & P (3,9,18,19); R & G (4,13,17,19) B & P (7,10,18,20); B & G (8,14,17,20) ; P & G (12,15,19,20) III. Any three distinct colors occur together in only one triangle R, B & G (17); R, B & P (18); R, P & G (19); B, P &G (20) IV. All sides with same color occurs only once R (1); B (6); P (11); G (16) (17) RBG (18) RBP (19) RPG (20) BGP
  • 35. Supercomputing Facility for Bioinformatics & Computational Biology IITD Rule 1. Amino acid side chains have evolved based on four chemical properties. A minimum of one and a maximum of three properties are used to specify each amino acid. Rule 2. Each property occurs in exactly 10 amino acids. Rule 3. Any two properties occur simultaneously in only four amino acids. Rule 4. Any three properties occur simultaneously in only one amino acid. Rule 5. Amino acids characterized by a single property occur only once. Text book classifications do not satisfy the above rules! Either the above rules are irrelevant to amino acids or we need to revise our understanding of the language of proteins. Jayaram, B.. Decoding the Design Principles of Amino Acids and the Chemical Logic of Protein Sequences. Available from Nature Precedings. http://hdl.handle.net/10101/npre.2008.2135.1 200
  • 36. Supercomputing Facility for Bioinformatics & Computational Biology IITD Property (I): Presence of sp3 hybridized γ carbon atom. (a) Exactly 10 amino acids {E, I, K, L, M, P, Q, R, T, V} possess this property as required by Rule 2 above. Property (II): Hydrogen bond donor ability. (a) Exactly 10 amino acids {C, H, K, N, Q, R, S, T, W, Y} possess this property. (b) Also, only four amino acids (K, Q, R, T) exhibit both properties (I & II together) as required by Rule 3. Property (III): Absence of δ carbon. (a) Exactly 10 amino acids {A, C, D, G, I, M, N, S, T, V} have this property. Ile is included in this set as one of the branches of its side chain is lacking in a δ carbon. (b) I and III occur simultaneously in only four amino acids (I, M, T, V) and similarly II and III occur simultaneously in only four amino acids (C, N, S, T). (c) Rule 4 requires that the above three properties (I, II and III) occur simultaneously in only one amino acid (T) and this conforms to the expectation.
  • 37. Supercomputing Facility for Bioinformatics & Computational Biology IITD The most likely candidate for property (IV): Absence of branching. Linearity of the side chains / non-occurrence of bidentate forks with terminal hydrogens in the side chains. (a) This pools together 10 amino acids in the set {A, D, E, F, H, K, M, P, S, Y}. Side chains with single rings are treated as without forks. The sulfhydryl group in Cys and its ability to form disulfide bridges requires it to be treated as forked. Accepting that this property (IV) satisfies Rule 2, (b) Rule 3 is satisfied by I and IV (E, K, M, P); by II and IV (H, K, S, Y) and by III and IV (A, D, M, S). (c) Also, Rule 4 is satisfied by I, II and IV (K), by I, III and IV (M) and by II, III and IV (S). With all the four properties (I, II, III and IV) specified, amino acids characterized by a single property occur only once: property I (L), property II (W), property III (G) and property IV (F), consistent with Rule 5.
  • 38. Supercomputing Facility for Bioinformatics & Computational Biology IITD 20 amino acids and some chemical properties of their side chains I. Presence of sp3 hybridized  carbon (g) II. Presence of hydrogen bond donor group (d) III. Absence of  carbon (s) IV. Absence of forks with hydrogens (l) Assignment # Amino acid A Alanine No No Yes Yes g0d0s2l1 C Cysteine No Yes Yes No g0d1s2l0 D Aspartate No No Yes Yes g0d0s1l2 E Glutamate Yes No No Yes g1d0s0l2 F Phenylalanine No No No Yes g0d0s0l3 G Glycine No No Yes No g0d0s3l0 H Histidine No Yes No Yes g0d2s0l1 I Isoleucine Yes No Yes No g2d0s1l0 K Lysine Yes Yes No Yes g1d1s0l1 L Leucine Yes No No No g3d0s0l0 M Methionine Yes No Yes Yes g1d0s1l1 N Asparagine No Yes Yes No g0d2s1l0 P Proline Yes No No Yes g2d0s0l1 Q Glutamine Yes Yes No No g1d2s0l0 R Arginine Yes Yes No No g2d1s0l0 S Serine No Yes Yes Yes g0d1s1l1 T Threonine Yes Yes Yes No g1d1s1l0 V Valine Yes No Yes No g1d0s2l0 W Tryptophan No Yes No No g0d3s0l0 Y Tyrosine No Yes No Yes g0d1s0l2
  • 39. Supercomputing Facility for Bioinformatics & Computational Biology IITD Amino acid chemical logic based protein structure modeling s l d INPUT: Amino Acid Query Sequence Scanning Through Protein Structure Template Selection and Query Sequence–Template Structure Alignment Based Upon New Chemical Logic of Protein Sequences Library Template Dependent Model Structure Generation ( FULL LENGTH or FRAGMENT ) FULL LENGTH FULL LENGTH OUTPUT: Top 5 model Structures Fragment Assembly to Generate Full Length Loop Optimization, Energy Scoring and Ranking of Full Length Candidate Structures
  • 40. Supercomputing Facility for Bioinformatics & Computational Biology IITD “Deployment” of a Structural Metric for Capturing Native I. Structure Metric II. Surface Area Metric III. Energy Metric Total Sample Space of decoys Decoys Decoys Decoys 5% Selection 50% Selection 25% Selection Top 5
  • 41. Supercomputing Facility for Bioinformatics & Computational Biology IITD pcSM ( physico-chemical scoring function) Avinash Mishra, Satyanarayan Rao, Aditya Mittal and B. Jayaram, “Capturing Native/Native-like Structures with a PhysicoChemical Metric (pcSM) in Protein Folding”, BBA proteins & proteomics, 2013, DOI: 10.1016/j.bbapap.2013.04.023.
  • 42. Supercomputing Facility for Bioinformatics & Computational Biology IITD From Sequence to Structure: The Latest Bhageerath-H Pathway Input amino acid sequence Bhageerath-H Strgen for protein decoy generation DECOY GENERATION Amino acids chemical logic based decoy generation pcSM scoring for top 5 decoy selection Quantum (AM1) structure refinement Top 5 native-like candidate structures NATIVE LIKE STRUCTURE SELECTION STRUCTURE REFINEMENT
  • 43. Supercomputing Facility for Bioinformatics & Computational Biology IITD Results at a Glance Input sequences : 67 CASP10 targets for which natives are released 1. Homology + de novo (Bhageerath-H at CASP10: 45%) 2. Exhaustive Homology + improved de novo (Post CASP10: 52%) 3. New alignments with chemical logic of AAs + improved de novo (61%) 4 & 5. Classical & 6. QM refinement (66%) 7. Inaccessible conformational space as of June, 2013 (34%) Bhageerath-H at CASP10 (July, 2012):45% Best server at CASP10 (July, 2012): 55% Union of best predictions of top 70 servers at CASP10 (July, 2012): 60% Bhageerath-H (June, 2013): 66% Further Challenges: Going beyond 66% & improving resolution to ~ 2 to 3 Å
  • 44. Path to Mt. Everest is worked out! Crossed CAMP III (66%)!!
  • 45. Supercomputing Facility for Bioinformatics & Computational Biology IITD Bhageerath-H: A Freely Accessible Web Server http://www.scfbio-iitd.res.in/bhageerath/bhageerath_h.jsp The user inputs the amino acid sequence & five candidate structures for the native are emailed back to the user
  • 46. Supercomputing Facility for Bioinformatics & Computational Biology IITD Target Directed Lead Molecule Design Sanjeevini B. Jayaram, Tanya Singh et al., "Sanjeevini: a freely accessible web-server for target directed lead molecule discovery", BMC Bioinformatics, 2012, 13, S7. http://www.biomedcentral.com/14712105/13/S17/S7
  • 47. Supercomputing Facility for Bioinformatics & Computational Biology IITD COST & TIME INVOLVED IN DRUG DISCOVERY Target Discovery 2.5yrs 4% Lead Generation Lead Optimization 3.0yrs 15% Preclinical Development 1.0yrs 10% Phase I, II & III Clinical Trials 6.0yrs 68% FDA Review & Approval 1.5yrs 3% Drug to the Market 14 yrs $1.4 billion Source: PAREXEL‟s Pharmaceutical R&D Statistical Sourcebook, 2001, p96.; Hileman, Chemical Engg. News, 2006, 84, 50-1.
  • 48. Target Directed Lead Molecule Design Computer Aided Drug Design DRUG NON-DRUG
  • 49. De novo LEAD-LIKE MOLECULE DESIGN: THE SANJEEVINI PATHWAY Database User Candidate molecules Drug Target Identification Drug-like filters Mutate / Optimize Geometry Optimization Quantum Mechanical Derivation of Charges Assignment of Force Field Parameters 3-Dimensional Structure of Target Active Site Identification on Target & Ligand Docking Energy Minimization of Complex Binding free energy estimates - Scoring Molecular dynamics & post-facto free energy component analysis (Optional) Lead-like compound Jayaram et al., “Sanjeevini”, Indian J. Chemistry-A. 2006, 45A, 1834-1837.
  • 50. S a n je e v in i P a t h w a y or N R D B M /M illio n M o le c u le L ib r a r y /N a t u r a l P r o d u c t s a n d T h e ir D e r iv a t iv e s M o le c u la r D a ta b a s e T a r g e t P ro te in /D N A L ig a n d M o le c u le o r, U p lo a d B io a v a lib a lity C h e c k ( L ip in s k i C o m p lia n c e ) B in d in g E n e rg y E s tim a tio n b y R A S P D p ro to c o l P r e d ic tio n o f a ll p o s s ib le a c tiv e s ite s ( fo r p r o te in o n ly a n d if b in d in g s ite is n o t k n o w n ). G e o m e tr y O p tim iz a tio n T P A C M 4 /Q u a n tu m M e c h a n ic a l D e r iv a tio n o f C h a r g e s A s s ig n m e n t o f F o r c e F ie ld P a r a m e te r s L ig a n d M o le c u le r e a d y f o r D o c k in g P r o te in /D N A r e a d y f o r D o c k in g NH2 N N + OH or NH N N D o c k & S c o re M o le c u la r d y n a m ic s & p o s t- fa c to f r e e e n e r g y c o m p o n e n t a n a ly s is ( O p tio n a l)
  • 51. Supercomputing facility for bioinformatics and computational biology IIT Delhi Molecular Descriptors / Drug-like Filters Lipinski’s rule of five Molecular weight  500 Number of Hydrogen bond acceptors < 10 Number of Hydrogen bond donors <5 logP 5 Additional filters Molar Refractivity  140 Number of Rotatable bonds < 10
  • 53. http://www.scfbio-iitd.res.in/dock/ActiveSite_new.jsp Tanya Singh, D. Biswas, B. Jayaram, 2011, J. Chem. Inf. Modeling, 51 (10), 2515-2527.
  • 54. http://www.scfbio-iitd.res.in/software/drugdesign/raspd.jsp Goutam Mukherjee and B. Jayaram, "A Rapid Identification of Hit Molecules for Target Proteins via PhysicoChemical Descriptors", Phys. Chem. Chem. Phys., 2013, DOI:10.1039/C3CP44697B.
  • 55. Supercomputing facility for bioinformatics and computational biology IIT Delhi Quantum Chemistry on Candidate drugs for Assignment of Force Field Parameters -0.1206 0.1251 -0.0653 0.0083 0.1727 AM1 0.1251 6-31G*/RESP -0.1838 -0.0653 0.0166 -0.1838 TPACM-4 0.1382 -0.5783 0.1727 0.1335 -0.1044 -0.1718 0.0191 0.1191 -0.2085 0.0387 0.1191 -0.0516 -0.3440 0.1404 -0.0162 0.0796 0.0796 -0.0341 0.1191 -0.0099 0.1302 -0.0099 0.0796 -0.7958 -0.7958 G. Mukherjee, N. Patra, P. Barua and B. Jayaram, J. Computational Chemistry, 32, 893-907 (2011).
  • 57. Supercomputing Facility for Bioinformatics & Computational Biology IITD MONTE CARLO DOCKING OF THE CANDIDATE DRUG IN THE ACTIVE - SITE OF THE TARGET www.scfbio-iitd.res.in/dock/pardock.jsp + RMSD between the docked & the crystal structure is 0.2Å ENERGY MINIMIZATION 5 STRUCTURES WITH LOWEST ENERGY SELECTED A. Gupta, A. Gandhimathi, P. Sharma, and B. Jayaram, “ParDOCK: An All Atom Energy Based Monte Carlo Docking Protocol for Protein-Ligand Complexes”, Protein and Peptide Letters, 2007, 14(7), 632-646.
  • 58.
  • 59.
  • 60.
  • 61. Supercomputing facility for bioinformatics and computational biology IIT Delhi Binding Affinity Analysis After obtaining candidate molecules from docking an dscoring, molecular dynamics simulations followed by free energy analyses (MMPBSA/MMGBSA) are recommended. + G0 [Protein]aq + [Protein*Inhibitor*]aq II I [Protein*]aq [Inhibitor*]aq VI IV III [Protein*]vac [Inhibitor]aq V + [Inhibitor*]vac [Protein*Inhibitor*]vac Parul Kalra, Vasisht Reddy, B. Jayaram, “A Free Energy Component Analysis of HIV-I ProteaseInhibitor Binding”, J. Med.Chem., 2001, 44, 4325-4338.
  • 62. Supercomputing facility for Bioinformatics and Computational Biology IIT Delhi Affinity / Specificity Matrix for Drugs and Their Targets/Non-Targets Shaikh, S., Jain. T., Sandhu, G., Latha, N., Jayaram., B., A physico-chemical pathway from targets to leads, 2007, Current Pharmaceutical Design, 13, 3454-3470. Drug1 Drug2 Drug3 Drug4 Drug5 Drug6 Drug7 Drug8 Drug9 Drug10 Drug11 Drug12 Drug13 Drug14 Target1 Target2 Target3 Target4 Target5 Target6 Target7 Target8 Target9 Target10 Target11 Target12 Target13 Target14 BLUE: HIGH BINDING AFFINITY GREEN: MODERATE AFFINITY ORANGE: POOR AFFINITY Diagonal elements represent drug-target binding affinity and off-diagonal elements show drug-non target binding affinity. Drug 1 is specific to Target 1, Drug 2 to Target 2 and so on. Target 1 is lymphocyte function-associated antigen LFA-1 (CD11A) (1CQP; Immune system adhesion receptor) and Drug 1 is lovastatin.Target 2 is Human Coagulation Factor (1CVW; Hormones & Factors) and Drug 2 is 5-dimethyl amino 1-naphthalene sulfonic acid (dansyl acid). Target 3 is retinol-binding protein (1FEL; Transport protein) and Drug 3 is n-(4-hydroxyphenyl)all-trans retinamide (fenretinide). Target 4 is human cardiac troponin C (1LXF; metal binding protein) and Drug 4 is 1-isobutoxy-2-pyrrolidino-3[n-benzylanilino] propane (Bepridil). Target 5 is DNA {1PRP; d(CGCGAATTCGCG)} and Drug 5 is propamidine. Target 6 is progesterone receptor (1SR7; Nuclear receptor) and Drug 6 is mometasone furoate. Target 7 is platelet receptor for fibrinogen (Integrin Alpha-11B) (1TY5; Receptor) and Drug 7 is n-(butylsulfonyl)-o-[4-(4-piperidinyl)butyl]-l-tyrosine (Tirofiban). Target 8 is human phosphodiesterase 4B (1XMU; Enzyme) and Drug 8 is 3-(cyclopropylmethoxy)-n-(3,5-dichloropyridin-4-yl)-4-(difluoromethoxy)benzamide (Roflumilast). Target 9 is Potassium Channel (2BOB; Ion Channel) and Drug 9 is tetrabutylammonium. Target 10 is {2DBE; d(CGCGAATTCGCG)} and Drug 10 is Diminazene aceturate (Berenil). Target 11 is Cyclooxygenase-2 enzyme (4COX; Enzymes) and Drug 11 is indomethacin. Target 12 is Estrogen Receptor (3ERT; Nuclear Receptors) and Drug 12 is 4-hydroxytamoxifen. Target 13 is ADP/ATP Translocase-1 (1OKC; Transport protein) and Drug 13 is carboxyatractyloside. Target 14 is Glutamate Receptor-2 (2CMO; Ion channel) and Drug 14 is 2-({[(3e)-5-{4-[(dimethylamino)(dihydroxy)-lambda~4~-sulfanyl]phenyl}-8-methyl-2-oxo6,7,8,9-tetrahydro-1H-pyrrolo[3,2-H]isoquinolin-3(2H)-ylidene]amino}oxy)-4-hydroxybutanoic acid. The binding affinities are calculated using the software made available at http://www.scfbio-iitd.res.in/software/drugdesign/bappl.jsp and http://www.scfbio-iitd.res.in/preddicta.
  • 63. Supercomputing Facility for Bioinformatics & Computational Biology IITD Future of Drug Discovery: Towards a Molecular View of ADME/T D ru g S ite o f A d m in istra tio n Parenteral Route O r a l R o u te A b so rp tio n Resorption D istrib u tio n fro m P la sm a B ound D rug U nbound D rug M e ta b o lism E x c re tio n L iv e r B ile , S a liv a , S w e a t, S ite o f A c tio n K id n e y D rug T arget The distribution path of an orally administered drug molecule inside the body is depicted. Black solid arrows: Complete path of drug starting from absorption at site of administration to distribution to the various compartments in the body, like sites of metabolism, drug action and excretion. Dashed arrows: Path of the drug after metabolism. Dash-dot arrows: Path of drug after eliciting its required action on the target. Dot arrows: Path of the drug after being reabsorbed into circulation from the site of excretion. Affinity/specificity are under control but toxicity is yet to be conquered.
  • 64. OVERVIEW OF METABOLISM AND TRANSPORT IN P.falciparum
  • 65. Supercomputing Facility for Bioinformatics & Computational Biology IITD From Genome to Hits Genome X Teraflops Chemgenome Bhageerath Sanjeevini Hits
  • 66. Supercomputing Facility for Bioinformatics & Computational Biology IITD Chikungunya Virus Chikungunya is one of the most important re-emerging viral borne disease spreading globally with sporadic intervals. It is categorized as a BSL3 pathogen and under „C‟ grade by National Institute of Allergy and Infectious Diseases (NIAID), in 2008. But, yet no approved drug/vaccine is available currently in the public domain for its treatment/prevention. Anjali Soni, Khushhali Menaria, Pratima Ray and B. Jayaram. “Genomes to Hits in Silico: A Country Path Today, A Highway Tomorrow: A case study of chikungunya”, Current Pharmaceutical Design, 2013, 19, 4687-4700 .
  • 67. Supercomputing Facility for Bioinformatics & Computational Biology IITD Some available information on CHIKV proteins but no structures Protein Type Proteins Functions NonStructural nsP1  Methyl transferase domain (acts as cytoplasmic capping enzyme) Proteins nsP2  Viral RNA helicase domain (part of the RNA polyemerase complex)  Peptidase C9 domain (cleaves four mature proteins from non structural polyprotein) nsP3  Appr. 1-processng domain (minus strand and subgenomic 26S mRNA synthesis) nsP4  Viral RNA dependent RNA polymerase domain (Replicates genomic and antigenomic RNA and also transcribes 26S subgenomic RNA which encodes for structural proteins) Structural C  Peptidase_S3 domain (autocatalytic cleavage) proteins E3  Alpha virus E3 spike glycoprotein domain E2  Alpha virus E2 glycoprotein domain (viral attachment to host) 6K  Alpha virus E1 glycoprotein domain (viral glycoprotein processing and membrane permeabilization)  Signal peptide domain E1  Alpha virus E1 glycoprotein domain (class II viral fusion protein)  Glycoprotein E dimerization domain (forms E1-E2 heterodimers in inactive state and E1 trimers in active state)
  • 68. Supercomputing Facility for Bioinformatics & Computational Biology IITD Flow diagram illustrating the steps involved in achieving hit molecules from genomic information Whole genome sequence of Chikungunya virus is retrieved from NCBI: NC_004162.2 http://www.ncbi.nlm.nih.gov/nuccore/27754751?report=fasta Genes are predicted which are then translated to the protein sequences using Chemgenome 3.0 http://www.scfbio-iitd.res.in/chemgenome/chemgenome3.jsp These polyprotein sequences are spliced w.r.t literature and results are processed for 3-D structure prediction by Bhageerath-H http://www.scfbio-iitd.res.in/bhageerath/bhageerath_h.jsp Modeled structures are studied for identification of potential active sites by active site finder AADS http://www.scfbio-iitd.res.in/dock/ActiveSite.jsp A million compound library of small molecules is screened against the predicted binding sites using RASPD http://www.scfbio-iitd.res.in/software/drugdesign/raspd.jsp The screened molecules are docked, scored and optimized iteratively using SANJEEVINI http://www.scfbio-iitd.res.in/sanjeevini/sanjeevini.jsp Hits ready to be synthesized and tested in laboratory
  • 69. Supercomputing Facility for Bioinformatics & Computational Biology IITD Input the CHIKV Genome sequences to ChemGenome 3.0: Chikungunya virus (strain S27-African prototype), complete genome NCBI Ref_Sequence: NC_004162.2 ChemGenome 3.0 output Two protein coding regions are identified. These proteins are the polyproteins. Genes Start End Type 1 77 7501 Nonstructural Polyproteins 2 7567 11313 Structural Polyproteins
  • 70. Supercomputing Facility for Bioinformatics & Computational Biology IITD The nonstructural polyproteins are cleaved into 4 protein sequences w.r.t literature. These sequences serve as input to Bhageerath-H server. Bhageerath-H output
  • 71. Supercomputing Facility for Bioinformatics & Computational Biology IITD Input Protein Structures to an Automated version of Active site finder (AADS/Sanjeevini) 10 potential binding sites are identified against each model of the proteins (shown as black dots in the figure) Scanning against a million compound library RASPD/Sanjeevini calculations were carried in search of the potential therapeutics with an average cut-off binding affinity to limit the number of candidates. (RASPD uses an empirical scoring function which builds in Lipinski‟s rules and Wiener index).
  • 72. Supercomputing Facility for Bioinformatics & Computational Biology IITD RASPD output Top 100 molecules were screened with the cutoff binding energy to be -8.00 kcal/mol. Out of these 100, one molecule for each model is selected with good binding energy from one million molecule database corresponding to the top 5 predicted binding sites. The molecules were choosen for atomic level binding energy calculations using ParDOCK/Sanjeevini. These molecules could be tested in the Laboratory.
  • 73. Supercomputing Facility for Bioinformatics & Computational Biology IITD In silico suggestions of candidate molecules against CHIKV
  • 74. Supercomputing Facility for Bioinformatics & Computational Biology IITD In progress Sub-micromolar compounds identified using Sanjeevini (after design, synthesis & testing) against Malaria (Jamia Millia & ICGEB), Breast Cancer (Amity), Alzheimers (KSBS, IITD) & Diabetes (IITD) These are under further development
  • 75. Supercomputing Facility for Bioinformatics & Computational Biology IITD SCFBio Team 16 processor Linux Cluster Storage Area Network ~ 10 teraflops of computing; 50 terabytes of storage
  • 76. Supercomputing Facility for Bioinformatics & Computational Biology IITD BioComputing Group, IIT Delhi (PI : Prof. B. Jayaram) Shashank Shekhar Priyanka Dhingra Ashutosh Shandilya Varsha Singh Prashant Rana Manpreet Singh Preeti Bisht Present Goutam Mukherjee Vandana Shekhar Abhilash Jayaraj Ankita Singh Rahul Kaushik Samar Chakraborty Sanjeev Kumar Dr. Achintya Das Dr. Tarun Jain Dr. Kumkum Bhushan Dr. Nidhi Arora Pankaj Sharma A.Gandhimathi Neelam Singh Dr. Sandhya Shenoy Sahil Kapoor Navneet Tomar Former Dr. N. Latha Dr. Saher Shaikh Dr. Poonam Singhal Dr. Garima Khandelwal Praveen Agrawal Gurvisha Sandhu Shailesh Tripathi Rebecca Lee Satyanarayan Rao Surojit Bose Tanya Singh Avinash Mishra Anjali Soni Debarati Dasgupta Ali Khosravi R. Nagarajan Dr. Pooja Narang Dr. Parul Kalra Dr. Surjit Dixit Dr. E. Rajasekaran Vidhu Pandey Anuj Gupta Dhrubajyoti Biswas Bharat Lakhani Pooja Khurana Kritika Karri
  • 77. Under Incubation at IITD (since April, 2011) Recipient of TATA NEN 2012 Award Recipient of Biospectrum 2013 Award Recipient of BioAsia 2014 Award Novel Technologies Computational Network Genomics Target Discovery Proteomics Compound Screening NI research pipeline Sahil Kapoor Avinash Mishra Shashank Shekhar Hit Molecules
  • 78. Supercomputing Facility for Bioinformatics & Computational Biology IITD Acknowledgements Department of Biotechnology Department of Science & Technology Ministry of Information Technology Council of Scientific & Industrial Research Indo-French Centre for the Promotion of Advanced Research (CEFIPRA) HCL Life Science Technologies Dabur Research Foundation Indian Institute of Technology, Delhi