Podium Presentation at SLAS Annual Meeting, San Diego, Feb 4-8, 2012 entitled "Screening Heursitcs and Chemical Propery BIas; New Directions for Lead Identification and Optimization"
1. Screening Heuristics & Chemical Property
Bias - New directions for Lead Identification and
Optimization
Andy Pope
Platform Technology &
Science, GlaxoSmithKline,
Collegeville PA, USA
SLAS 2012, San Diego
February 4-8, 2012
4. Some available datasets inside GSK
Descriptor Descriptor Descriptor
Descriptor
metadata metadata metadata
metadata
Hit ID Compound
Structures
Profiles
Public + Properties Public
Data HTS Program Data
GSK
>300 profiling Compounds
>500 + Data
Descriptor metadata
Descriptor metadata
e.g. PubChem e.g. Literature,
>>106
Descriptor metadata
FS Connectivity
Target class Maps
>200
profiling Phys-chem
>300 DMPK
ELT
>105
>150
Safety Marketed
FBDD profiling Drugs et.
>50 >103
>20
Other GSK Data – e.g. genomic, bio-informatic, clinical
5. 300+ HTS Campaigns – 2004-11
Target class (13 classes)
Assay technology (15 classes)
2007-11 screens – sized by count of screens
6. Twin approaches to screening heuristics
1. Building Collective wisdom 2. New “big” data analysis/ insights
- Capture, combine and share the - Look for data patterns in large
experiences of screeners and data aggregated datasets
from screens (and screeners)
e.g. e.g.
How well do different assay methods perform? Do chemical properties influence the results of
screens?
What is the impact of screen quality and what
should be targeted in assay development?
How are screen results related between targets
What policies do I need in place to have a high and assay methods?
quality screening process?
Which is the best method to use to discover hits?
Which assay technology works best?
How are library properties reflected in the hits?
7. Building Collective Wisdom – a simple example
Some Questions;
- What actually happens in
practice as z’ varies?
- What z’ should we be aiming
for?
- Is this affected by the type
of assay?
- What is the appropriate
trade off between cost,
robustness and sensitivity?
- How are we doing?
From SBS Virtual Seminar Series 2007 - HTS Module 1
8. Z’ Heuristics
Statistical cut-off (% effect)
- Z’ >0.8 is ideal, >0.7 acceptable
- Z’ <0.7 many aspects of performance degrade
(e.g. failures, cycle times, false +ve/-ve, hit confirmation)
- Z’ vs “sensitivity” trade-off arguments may be based on
false hunches
- Target & assay type does not make a major difference
Average Z’ of assay in HTS production
Avge. Z’
0.4-0.5
Production failure rate (% of plates)
0.5-0.55
Cycle time (weeks/campaign)
0.55-0.6
0.6-0.65
0.7-0.75
0.65-0.7
0.75-0.8
>0.8
Average Z’ of assay in HTS production
Average Z’ of assay in HTS production
9. Properties, properties, properties…..
….But, do they affect screening data?
….are we selecting hits with the best properties?
….Bottom line; High cLogP (greasiness) is BAD
...This needs to be fixed at the start ..i.e in hit ID
….and tends to creep up during Lead Op.
10. Do compound molecular properties impact how
they behave in screens?
Aggregate results from all 330
campaigns 2005-2010 with
e.g. Compound total polar surface area (tPSA) >500K tests
makes no difference
Compounds with tPSA 80-85 Å2
26M measured responses in this bin
- 485k marked as “hit”
Hit rate = 100*(485k/26M) = 1.86%
“hit” = % effect => 3 RSD
of sample population in
Hit Rate (%)
that specific screen
The total polar surface area (tPSA) is
defined as the surface sum over all
- Hit rate for Compounds polar atoms
in specific tPSA bin < 60 A2 predicts brain penetration
> 140 A2 predicts poor cell penetration
Polar Surface Area (tPSA, Å2)
11. Size Matters……
Middle 80% of Cpds
270 470
Cumulative % Cpds
% Cpds in MW Bin
4.0%
Hit Rate (%)
2.62%
1.50% MW
1.2%
Overall Hit rate rises 1.7-fold across
the middle 80% of the screening deck
i.e. 70% rise in hit rate from MW = 270 to
Molecular Weight (MW) MW = 470
- Only bins containing 1M or more records are shown
3.3-fold rise across full MW range
12. Greasiness matters most……
Middle 80% of Cpds
1 5
Cumulative % Cpds
% Cpds in ClogP Bin
4.5%
3.31%
Hit Rate (%)
ClogP
1.14%
1.1%
Overall hit rate rises 2.9-fold across the
middle 80% of the screening deck
i.e. from ClogP = 1 5
ClogP 4.1-fold rise across full ClogP range
- Only bins containing 1M or more
records are shown
13. HTS Promiscuity - cLogP
Compounds Compounds hitting
hitting ~1 target >10% of targets
cLogP
Note; Compounds
required to have been
run in 50 HTS and
yielded > 50% effect in
a single screen to be
included
Frequency at bin > Frequency at bin > Frequency at bin > Frequency at bin >
Inhibition frequency Index* (%)
*Inhibition frequency index (IFI) = % of screens where cpd yielded
>50% inhibition, where total screens run => 50
14. “Dark” Matter is small and polar
– Compounds which have not yielded >50% effect
once in >50 screens
Molecular Weight (Da)
cLogP
15. Biases translate to full-curve follow-up and beyond
Property bias in primary HTS hit marking are propagated forward
to dose-response follow-up
SS testing
FC testing
FC – SS differential
% Compounds Tested
% Compounds Tested
cLogP Molecular Weight
Elevated testing of large, lipophilic Reduced testing of small, polar compounds
compounds in the full-curve phase of HTS in the full-curve phase of HTS
Note; Plots represent data from 402M single-concentration responses &
2.1M full-curve results
16. Property bias detection at an individual screen level
e.g. Screens with largest response to cLogP
Hit rate as % of HR at cLogP =3.5
cLogP
17. Assay Technology vs. property bias
e.g. By assay technology, normalized to HR for that screen at median collection cLogP value
Colored by Hit
rate (%)
Hit rate as % of HR at cLogP =3.5
e.g. No clear origins in any meta-data
- Assay Technology, Target class, Screen quality etc.
…. But effects detectable even at single screen level
cLogP
18. Lipophilicity trends in PubChem HTS Data
Primary data from around 100 Academic HTS campaigns obtained from
PubChem BioAssay
Lipophilicity – similar to GSK HTS Compound size – little effect
3.80%
Hit Rate (%)
Hit Rate (%)
Pretty flat
2.27%
2.14%
1.28%
ClogP (MW)
GSK screening deck (>50 HTSs, 2.01M cpds)
ClogP = 0.00835*MW – 0.058, R2 = 0.18
PubChem Compounds (405k)
ClogP = 0.00554*MW + 0.97, R2 = 0.09
19. Not just HTS… Lipophilicity trends in kinase focused set screens
Primary data from ~50 focused screen campaigns against protein kinases
Lipophilicity and size – similar to GSK HTS
Y% Y%
Hit Rate (% of cpds >50% I) at 10 uM
Hit Rate (% of cpds >50% I) at 10 uM
X%
X%
ClogP MW
20. Bias from other simple chemical properties?
Property R2, ± vs MW R2, ± vs
ClogP
+ve -ve
MW 1, + 0.21, +
cLogP fCsp3 ClogP 0.21, + 1.0, +
MW (HAC) flexibility HAC 0.92, + 0.19, +
fCsp3 0.15, + 0.00
RotBonds 0.36, + 0.04, +
Hit Rate (%)
tPSA 0.16, + 0.08, -
Chiral 0.02, + 0.00
HetAtmRatio 0.02, - 0.34, -
Complexity 0.31, + 0.02, +
Flexibility 0.02, + 0.00
AromRings 0.22, + 0.16, +
Fraction of carbons that are sp3 (fCsp3) HBA 0.11, + 0.10, -
HBD 0.01, + 0.02, -
21. Improving hit marking – Property Biasing
Mean + 3 x RSD cut-off
Hit Rate (%)
Ordinary HTS Hit Marking
Property-biased Hit Marking
More attractive
properties
% Compounds
- promote MW
Less attractive
Hit Rate (%)
properties
- demote
Ordinary HTS Hit Marking
Property-biased Hit Marking
RESPONSE (% control)
ClogP
22. Evolving the screening collection…
GSK’s Compound Collection Enhancement (CCE) strategy
- moving the HTS deck towards decreased size and lipophilicity with the aim of
improving chemical starting points
Compounds tested in HTS test datasets
% Compounds Exceeding Property Limit
- 2004
(% of total compounds in HTS)
- 2010
- D 2010 <> 2004
ClogP > 5
MW > 500
New
2011
ClogP Year
CCE Acquisition, Property Bounds
2004-05: Lipinski criteria (MW<500, ClogP<5)
Most recently: MW<360, ClogP<3
Inclusion of DPU lead-op cpds: MW<500, ClogP<5
23. Can property biases translate into lead optimization?
Cellular
Med. Biochemical “mechanistic”
Rodent DMPK,
chem target assay efficacy model
target assay
More potent in cell
Example from current
Lead Optimization
“patient in a Program
pIC50 Cell - Biochem
plate”
-Cellular activity favors
Or……. cLogP >4
- Directional “pull” to
More potent in biochem
more lipophilic cpds?
“biochemistry -Good DMPK at cLogP <3
in a (grease- - Value of cellular assay?
selective) bag”!
Binned cLogP
24. Property bias in broad pharmacological profiling
Early safety cross screening panel (eXP)
GSK Lead Op. compounds 2009-11 Marketed drugs
n = ~1000
Average % of assays giving IC50 <=10 uM
Average % of assays giving IC50 <=10 uM
GSK Terminated Leads & Candidates
n = ~2500
n = ~2500 n = ~400
GPCR’s – 17 Binned ClogP
Ion Channels – 8 Binned ClogP
Enzymes – 3
Kinases – 4
Nuclear Receptors – 2
Transporter – 3
Phenotypic – 3 (Blue Screen, Cell Heath, Phospholipidoses)
25. Property bias in broad pharmacological profiling
Early safety cross screening panel (eXP)
GSK Lead Op. compounds 2009-11
Average % of assays giving IC50 <=10 uM
n = ~2500
n = ~2500
GPCR’s – 17 Binned ClogP
Ion Channels – 8
Enzymes – 3
Kinases – 4
Nuclear Receptors – 2
Transporter – 3
Phenotypic – 3 (Blue Screen, Cell Heath, Phospholipidoses)
26. Kinome profiling – no impact of cLogP
~400 kinase Lead Op
% inhibition values (>300 kinase assays) Compounds vs
300 protein kinases
Binned ClogP
(>300 kinase assays)
% inhibition values
Kinase structural classifier
27. Conclusions
Heuristic approaches allow both refinement of best practice and new
insights
Standard screening processes favor the selection of lipophilic compounds
- A contributing factor in current issues with drug Lead/Candidate property space
occupancy
- Improvement in screening collections and analysis methods can overcome this, BUT
- All this effort is wasted if Lead Optimization pathways pull compounds back towards
unfavorable property space!!
The very large datasets generated from screening have considerable value
beyond the lifetime of individual campaigns
- Particularly crucial now that quality and cycle time problems are largely solved
- Many other examples exist beyond those shown here
- Please go look for these effects in your data!
28. Snehal Bhatt
Acknowledgements Stuart Baddeley
James Chan
Sue Crimmin
Pat Brady Tony Jurewicz Emilio Diez
Darren Green Glenn Hofmann Maite De Los Frailes
Stephen Pickett Stan Martens Bob Hertzberg
Sunny Hung Deb Jaworski
Jeff Gross Ricardo Macarron
Subhas Chakravorty Carl Machutta
Nicola Richmond Julio Martin-Plaza
Jesus Herranz Barry Morgan
Gonzalo Colmeranjo-Sanchez Juan Antonio Mostacero
Dave Morris
Dwight Morrow
Mehul Patel
…and numerous others who contributed Amy Quinn
to programs run by GSK 2004-2011….. Geoff Quinique
Mike Schaber
Zining Wu
Ana Roa
And colleagues...
Screening & Compound Profiling