The document discusses challenges in curating approved drug structures from public databases due to issues like structural variants, permutations of chiral centers, salt forms, mixtures, and other factors that lead to multiple representations of the same drug. This phenomenon of "multiplexing" has increased over time. While no single entity is fully responsible, drug companies do not always verify submitted structures, and multiple sources contribute variants without coordination. This makes it difficult for databases to determine a single canonical structure and limits consistency between sources. The author provides examples like Taxol to illustrate complex cases and advocates collaboration between sources to improve validation and merging of drug structure mappings.
1. Challenges of curating approved medicines:
Will the real drugs please stand up?
Chris Southan, representing the Database Team
NC-IUPHAR/BPS/GTPdb Biannual Meeting, Paris, October 2014
1
2. What is the total for approved drug structures?
Take your pick …..
2
4. Explanations
• Discordance: distinctly different drug molecular representations from
different sources that we would recognise canonically as the same
bioactive substance
• These are merged into multiple CIDs per drug (i.e. “multiplexed”) via the
PubChem chemistry rules due to:
– Permutation of R/S stereo centers
– Salt forms
– Mixtures
– Unresolved E/Z bonds
– Tautomers
– Isotopic derivatives including deuteration
4
5. Causes of drug structure multiplexing
• Inherent challenges and complexities of chemical representation
• Utility of PubChem depends on advanced rules applied to a submission-based
system
• Drug companies never verify their own structures in public databases
• Legacy of structure image primacy in documents
• No clear accountability for correctness of public approved drug structures
(companies? FDA? WHO(INN)? AMA(USAN)? Wikipedia? CAS?)
• Structural variants enter databases from general source proliferation,
large-scale patent extractions, chemical vendor submissions and
repeated exemplifications in journals
• The net effect is an inexorable increase in multiplexing but not necessarily
erroneous structures per se
5
12. Scale of the issue for approved drugs in PubChem:
multiplexing expansion from 2005 to 2014
12
13. So how are we doing in our database?
• Sets were salt-stripped for this comparison
• GTPdb (Oct 2014) has 983 approved drug CIDs concordant with either
ChEMBL or DrugBank
• But only 723 are 4-way concordant
• We will inspect the 152, 192 and 180 sectors for consensus expansion
13
14. Consequences and possible solutions to the
drug multiplexing issue
• Our drugs annotation Committee cannot magic these issues away
but their support is crucial
• Our consensus approach is useful and statistical defendable
• In the GTPdb we add curator comments and cross-pointers for key
multiplexed examples
• Sources that make the effort to collate drug structure sets should
cross-corroborate more
• A canonical approach to merging drug structure-to-bioactivity
mappings could be considered
• The inner connectivity layer of the InChIKey goes some way towards
this
14