SlideShare uma empresa Scribd logo
1 de 16
Baixar para ler offline
Chemoinformatics and
information management

Peter Willett, University of Sheffield, UK
Overview
• What is chemoinformatics and why is it
  necessary
• Managing structural information
• Typical facilities in chemoinformatics
  software
• Examples of current research
Drug discovery: I
• Drug discovery is a vastly complex, multi-disciplinary
  task that can extend over two decades
• The total cost for the discovery and development of a
  novel therapeutic agent is now ca. $1.5B
• Even so, only about 1 in 3 cover the R&D costs
   • But when they can do the pay-offs can be massive: Lipitor in
     2006 made $12.5B (cf MS Windows and Boeing 747)
• Patent cover is 20 years from initial announcement
   • Time is money so need to find potential drugs (and to reject non-
     drugs) much faster (and similarly for agrochemicals)
Drug discovery: II
• Chemoinformatics is one way of increasing the cost
  effectiveness of drug discovery
• Initial work in chemoinformatics as early as the Sixties:
  current interest because of developments in
   • Combinatorial chemistry
   • High throughput screening (HTS)
   • Change from sequential to massively parallel processing
• Resulting explosion in the amounts of data available in
  drug-discovery programmes, and an increased interest
  in computational methods
   • Focus on chemical structure diagram, cf development of other
     types of -informatics specialisms
Definitions
•   F.K. Brown (1998). Annual Reports in Medicinal Chemistry, 33,
    375-384
    • “The use of information technology and management has become
      a critical part of the drug discovery process. Chemoinformatics is
      the mixing of those information resources to transform data into
      information and information into knowledge for the intended
      purpose of making better decisions faster in the area of drug lead
      identification and optimization”
•   G. Paris (August 1999 ACS meeting), quoted by W.A. Warr at
    http://www.warr.com/warrzone.htm
    • “Chem(o)informatics is a generic term that encompasses the
      design, creation, organization, management, retrieval, analysis,
      dissemination, visualization and use of chemical information”
•   J. Gasteiger and T. Engels (editors) (2003). Chemoinformatics:
    a textbook. Wiley-VCH.
    •   “Chemoinformatics is the application of informatics methods to
        solve chemical problems.”
Representation of molecules
• Need for a machine-readable representation
  • 1D – computed/experimental global properties
  • 2D – the chemical structure diagram
  • 3D – atomic coordinate data
• 1D representations handled using conventional
  DBMS software
• Need to manipulate 2D and 3D data
Connection tables

                     9       1   C   2   2   6   1   7   1
                     O       2   C   1   2   3   1
         2                   3   C   2   1   4   2
             1
     3                       4   C   3   2   5   1
                     7
                         8   5   C   4   1   6   2
    4                        6   C   1   1   5   2
                 6
                             7   C   1   1   8   1   9   2
         5                   8   C   7   1
                             9   O   7   2



• An unambiguous representation of a 2D chemical
  structure diagram
• A connection table is a graph, the underlying data
  structure in chemoinformatics
Graph theory and chemistry
• Graph theory
   • Branch of mathematics that describes sets of objects, called
     nodes and the relationships between them, called edges
                                                         O
• A 2D connection table is a graph:                                 Br
   • Nodes correspond to atoms
   • Edges correspond to bonds
                                                  NH 2
• Graph matching algorithms
   • Search chemical databases
• Generation of other representations
Types of search
• Exact structure search (hashed connection table with
  graph isomorphism for collision handling)
• Substructure search (subgraph isomorphism)
   • cf partial or boolean matching in text
• Similarity searching (maximal common subgraph
  isomorphism (or simpler))
   • cf best match search or web searching
• Graph matching algorithms are effective
   • But time is factorial with the number of nodes
   • Need for efficient heuristics
Fingerprints                C
                                         O
                                       C C C
                        C   C   C
                            C




• A fingerprint (or fragment bit-string) is a binary vector
  encoding the presence (“1”) or absence (“0”) of
  fragment substructures in a molecule
• Each bit in the fingerprint represents one molecular
  fragment. Typical length is ~1000 bits
• An approximate representation, but one that can be
  processed very efficiently and hence often used as a
  precursor to graph matching
Chemoinformatics facilities
• Database searching as described previously
   • Structure and substructure searching originally
   • Similarity searching from mid-Eighties
   • 3D substructure searching from mid-Nineties (first rigid then
     flexible)
• Applications
   •   Database clustering
   •   Molecular diversity analysis
   •   Drug-likeness
   •   Virtual screening
        Ligand-based
        Structure-based
3D substructure
                               searching
• Generation of pharmacophore patterns
• Use of MOGA and hyperstructure approaches
                       O           a = 8.62+ 0.58 Angstroms
                                           -                                                       N
                                                                                                               O
                                   b = 7.08+ 0.56 Angstroms
                                           -
               c           a       c = 3.35+ 0.65 Angstroms                                                O
                                           -
                                                                                               O                   O


                       b       N                                                                       O       O
       O
                                                                               S
                                                                                           N
               O                                                           N
                                O
        O                  O                                                               N
                                     O                                         N
                                                   N
                                                                                               O
                       O                                                               O
                                               N       N               N                                   O
                           O                                                               O           O P O O
                                                   N       O                       N
                                                                   N
           O       N                                                                                       O   P O
                                                       O                           N                                   O
                                                               O       N
                           N                                                                                   O P     O
                                                                                           O
                                              O                            O                                       O
                           N   O                       O
                                                                                   O           O
Similarity searching
                     using 2D fingerprints
    Use of data fusion methods to enhance performance,
    combining information from multiple searches


                                   H
                                   N        O
        H    H                                                  H2N
        N    N       OH                                    H
N                                  N        NH2            N
                                                  N
                                   Q uery
        N
                                                           N

                          OH
                                                      H   H2N
HO                                                    N
             N
                               H                      N
                 N             N

                               N
Molecular modelling and QSAR
• Use of computational chemistry to obtain the structures
  and properties of small molecules
   • Quantum mechanics
   • Molecular dynamics
   • Molecular modelling
• Statistical correlation of structure (however described)
  with physical, chemical and biological properties
   • Initially biological activity (QSAR)
   • Now pharmacokinetics and toxicity (ADMET)
Integration with database searching
• Related, but largely separate, research areas for many
  years
   • Simple search operations on very large numbers of molecules
   • Increasingly complex operations on smaller and smaller
     (normally homogeneous) datasets
   • Substructural analysis as an early, notable exception
• The future lies in the integration of these two
  approaches, applying more sophisticated methods on
  larger datasets
   • Docking now well established
   • Property calculations at a database level
   • ADMET
General references
J. Gasteiger (ed.), Handbook of Chemoinformatics (Wiley-VCH,
   Weinheim, 2003).
W.L. Chen, Chemoinformatics: past, present and future, Journal of
  Chemical Information and Modeling 46 (2006) 2230-2255.
D.J. Wild and G.D. Wiggins, Challenges for chemoinformatics
   education in drug discovery, Drug Discovery Today 11 (2006) 436-
   439.
A.R. Leach and V.J. Gillet, An Introduction to Chemoinformatics
   (Kluwer, Dordrecht, 2nd sedition, 2007).
P. Willett, A bibliometric analysis of chemoinformatics, Aslib
   Proceedings 60 (2008) 4-17.

Mais conteúdo relacionado

Destaque

Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And Process
Prof. Dr. Basavaraj Nanjwade
 
Substructure Similarity Search in Graph Databases
Substructure Similarity Search in Graph DatabasesSubstructure Similarity Search in Graph Databases
Substructure Similarity Search in Graph Databases
pgst
 
URBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.ppt
URBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.pptURBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.ppt
URBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.ppt
grssieee
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
Rajarshi Guha
 
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Sean Ekins
 
Application of graph theory in drug design
Application of graph theory in drug designApplication of graph theory in drug design
Application of graph theory in drug design
Reihaneh Safavi
 

Destaque (20)

An Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of AgricultureAn Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
 
Molecular similarity searching methods, seminar
Molecular similarity searching methods, seminarMolecular similarity searching methods, seminar
Molecular similarity searching methods, seminar
 
Chemoinformatic
Chemoinformatic Chemoinformatic
Chemoinformatic
 
Cheminformatics in drug design
Cheminformatics in drug designCheminformatics in drug design
Cheminformatics in drug design
 
Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And Process
 
Substructure Similarity Search in Graph Databases
Substructure Similarity Search in Graph DatabasesSubstructure Similarity Search in Graph Databases
Substructure Similarity Search in Graph Databases
 
URBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.ppt
URBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.pptURBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.ppt
URBAN AREA PRODUCT SIMULATION FOR THE ENMAP HYPERSPECTRAL SENSOR.ppt
 
DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...
DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...
DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...
 
Code camp 2014 Talk Scientific Thinking
Code camp 2014 Talk Scientific ThinkingCode camp 2014 Talk Scientific Thinking
Code camp 2014 Talk Scientific Thinking
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
 
Chem spider introduction spring 2011
Chem spider introduction spring 2011Chem spider introduction spring 2011
Chem spider introduction spring 2011
 
Detection of novel metabolites and enzyme functions though in silico expansio...
Detection of novel metabolites and enzyme functions though in silico expansio...Detection of novel metabolites and enzyme functions though in silico expansio...
Detection of novel metabolites and enzyme functions though in silico expansio...
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical Structure
 
Bio inspiring computing and its application in cheminformatics
Bio inspiring computing and its application in cheminformaticsBio inspiring computing and its application in cheminformatics
Bio inspiring computing and its application in cheminformatics
 
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
 
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and OpportunitiesApplied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
 
Data Structures
Data StructuresData Structures
Data Structures
 
AVL Tree
AVL TreeAVL Tree
AVL Tree
 
Application of graph theory in drug design
Application of graph theory in drug designApplication of graph theory in drug design
Application of graph theory in drug design
 
Bioinformatics and Drug Discovery
Bioinformatics and Drug DiscoveryBioinformatics and Drug Discovery
Bioinformatics and Drug Discovery
 

Semelhante a Chemoinformatics and information management (6)

kddvince
kddvincekddvince
kddvince
 
Naked DNA And DNA Vaccines A Retrospective
Naked DNA And DNA Vaccines  A RetrospectiveNaked DNA And DNA Vaccines  A Retrospective
Naked DNA And DNA Vaccines A Retrospective
 
Dr vibha bhagat phd synopsis
Dr vibha bhagat phd synopsisDr vibha bhagat phd synopsis
Dr vibha bhagat phd synopsis
 
Structure-Activity Relationships and Networks: A Generalized Approach to Expl...
Structure-Activity Relationships and Networks: A Generalized Approachto Expl...Structure-Activity Relationships and Networks: A Generalized Approachto Expl...
Structure-Activity Relationships and Networks: A Generalized Approach to Expl...
 
Postdoctoral Research @ NAWCWD
Postdoctoral Research @ NAWCWDPostdoctoral Research @ NAWCWD
Postdoctoral Research @ NAWCWD
 
Lessons learned - the pharma experience
Lessons learned  - the pharma experienceLessons learned  - the pharma experience
Lessons learned - the pharma experience
 

Mais de Duncan Hull

Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Duncan Hull
 

Mais de Duncan Hull (20)

Why study plants?
Why study plants?Why study plants?
Why study plants?
 
Embedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculumEmbedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculum
 
Wikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the UglyWikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the Ugly
 
Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia
 
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome CampusBibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
 
OWL and OBO
OWL and OBOOWL and OBO
OWL and OBO
 
Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBI
 
How to Blog
How to BlogHow to Blog
How to Blog
 
OWL-XML-Summer-School-09
OWL-XML-Summer-School-09OWL-XML-Summer-School-09
OWL-XML-Summer-School-09
 
Authenticating Scientists with OpenID
Authenticating Scientists with OpenIDAuthenticating Scientists with OpenID
Authenticating Scientists with OpenID
 
The Invisible Scientist
The Invisible ScientistThe Invisible Scientist
The Invisible Scientist
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ Nettab
 
The Year of Blogging Dangerously
The Year of Blogging DangerouslyThe Year of Blogging Dangerously
The Year of Blogging Dangerously
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...Defrosting the Digital Library: A survey of bibliographic tools for the next ...
Defrosting the Digital Library: A survey of bibliographic tools for the next ...
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
 
Chemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-upChemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-up
 
Text mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literatureText mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literature
 
Issues for metabolomics and
Issues for metabolomics and Issues for metabolomics and
Issues for metabolomics and
 
Adding Meaning To Your Data
Adding Meaning To Your DataAdding Meaning To Your Data
Adding Meaning To Your Data
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Chemoinformatics and information management

  • 1. Chemoinformatics and information management Peter Willett, University of Sheffield, UK
  • 2. Overview • What is chemoinformatics and why is it necessary • Managing structural information • Typical facilities in chemoinformatics software • Examples of current research
  • 3. Drug discovery: I • Drug discovery is a vastly complex, multi-disciplinary task that can extend over two decades • The total cost for the discovery and development of a novel therapeutic agent is now ca. $1.5B • Even so, only about 1 in 3 cover the R&D costs • But when they can do the pay-offs can be massive: Lipitor in 2006 made $12.5B (cf MS Windows and Boeing 747) • Patent cover is 20 years from initial announcement • Time is money so need to find potential drugs (and to reject non- drugs) much faster (and similarly for agrochemicals)
  • 4. Drug discovery: II • Chemoinformatics is one way of increasing the cost effectiveness of drug discovery • Initial work in chemoinformatics as early as the Sixties: current interest because of developments in • Combinatorial chemistry • High throughput screening (HTS) • Change from sequential to massively parallel processing • Resulting explosion in the amounts of data available in drug-discovery programmes, and an increased interest in computational methods • Focus on chemical structure diagram, cf development of other types of -informatics specialisms
  • 5. Definitions • F.K. Brown (1998). Annual Reports in Medicinal Chemistry, 33, 375-384 • “The use of information technology and management has become a critical part of the drug discovery process. Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization” • G. Paris (August 1999 ACS meeting), quoted by W.A. Warr at http://www.warr.com/warrzone.htm • “Chem(o)informatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination, visualization and use of chemical information” • J. Gasteiger and T. Engels (editors) (2003). Chemoinformatics: a textbook. Wiley-VCH. • “Chemoinformatics is the application of informatics methods to solve chemical problems.”
  • 6. Representation of molecules • Need for a machine-readable representation • 1D – computed/experimental global properties • 2D – the chemical structure diagram • 3D – atomic coordinate data • 1D representations handled using conventional DBMS software • Need to manipulate 2D and 3D data
  • 7. Connection tables 9 1 C 2 2 6 1 7 1 O 2 C 1 2 3 1 2 3 C 2 1 4 2 1 3 4 C 3 2 5 1 7 8 5 C 4 1 6 2 4 6 C 1 1 5 2 6 7 C 1 1 8 1 9 2 5 8 C 7 1 9 O 7 2 • An unambiguous representation of a 2D chemical structure diagram • A connection table is a graph, the underlying data structure in chemoinformatics
  • 8. Graph theory and chemistry • Graph theory • Branch of mathematics that describes sets of objects, called nodes and the relationships between them, called edges O • A 2D connection table is a graph: Br • Nodes correspond to atoms • Edges correspond to bonds NH 2 • Graph matching algorithms • Search chemical databases • Generation of other representations
  • 9. Types of search • Exact structure search (hashed connection table with graph isomorphism for collision handling) • Substructure search (subgraph isomorphism) • cf partial or boolean matching in text • Similarity searching (maximal common subgraph isomorphism (or simpler)) • cf best match search or web searching • Graph matching algorithms are effective • But time is factorial with the number of nodes • Need for efficient heuristics
  • 10. Fingerprints C O C C C C C C C • A fingerprint (or fragment bit-string) is a binary vector encoding the presence (“1”) or absence (“0”) of fragment substructures in a molecule • Each bit in the fingerprint represents one molecular fragment. Typical length is ~1000 bits • An approximate representation, but one that can be processed very efficiently and hence often used as a precursor to graph matching
  • 11. Chemoinformatics facilities • Database searching as described previously • Structure and substructure searching originally • Similarity searching from mid-Eighties • 3D substructure searching from mid-Nineties (first rigid then flexible) • Applications • Database clustering • Molecular diversity analysis • Drug-likeness • Virtual screening Ligand-based Structure-based
  • 12. 3D substructure searching • Generation of pharmacophore patterns • Use of MOGA and hyperstructure approaches O a = 8.62+ 0.58 Angstroms - N O b = 7.08+ 0.56 Angstroms - c a c = 3.35+ 0.65 Angstroms O - O O b N O O O S N O N O O O N O N N O O O N N N O O O O P O O N O N N O N O P O O N O O N N O P O O O O O N O O O O
  • 13. Similarity searching using 2D fingerprints Use of data fusion methods to enhance performance, combining information from multiple searches H N O H H H2N N N OH H N N NH2 N N Q uery N N OH H H2N HO N N H N N N N
  • 14. Molecular modelling and QSAR • Use of computational chemistry to obtain the structures and properties of small molecules • Quantum mechanics • Molecular dynamics • Molecular modelling • Statistical correlation of structure (however described) with physical, chemical and biological properties • Initially biological activity (QSAR) • Now pharmacokinetics and toxicity (ADMET)
  • 15. Integration with database searching • Related, but largely separate, research areas for many years • Simple search operations on very large numbers of molecules • Increasingly complex operations on smaller and smaller (normally homogeneous) datasets • Substructural analysis as an early, notable exception • The future lies in the integration of these two approaches, applying more sophisticated methods on larger datasets • Docking now well established • Property calculations at a database level • ADMET
  • 16. General references J. Gasteiger (ed.), Handbook of Chemoinformatics (Wiley-VCH, Weinheim, 2003). W.L. Chen, Chemoinformatics: past, present and future, Journal of Chemical Information and Modeling 46 (2006) 2230-2255. D.J. Wild and G.D. Wiggins, Challenges for chemoinformatics education in drug discovery, Drug Discovery Today 11 (2006) 436- 439. A.R. Leach and V.J. Gillet, An Introduction to Chemoinformatics (Kluwer, Dordrecht, 2nd sedition, 2007). P. Willett, A bibliometric analysis of chemoinformatics, Aslib Proceedings 60 (2008) 4-17.