SlideShare uma empresa Scribd logo
1 de 18
Integrating patent chemistry with
public and private non-patent
research resources	
  

Nicko Goncharoff           ACS Fall 2012
Andrew Hinton, PhD         19 August
Christopher Southan, PhD
SureChem Data Collection!

Database of automatically mined structure data
from text and images!
!
• 20M annotated US, EP, WO full text records
and Japan patent abstracts!
                             I!
• 12M unique chemical structures!
• MEDLINE – 19M abstracts (coming Q4)!
ª  Free resource for researchers!         ª  Professional search needs!
ª  Enables linking to public and          ª  Data export, alerts, patent family
    proprietary content                        search, chemical relevance filters…!




                           ª  API or Data Feed access to
                               chemistry & full text!
                           ª  Integrate with internal
                               databases & workflows
Chemistry Mining Workflow!
Public Patent Chemistry Landscape!
Current Patent Sources In PubChem!

                   4000000                                           3.7 M

                   3500000

                   3000000
Numbers of SID's




                                                            2.3 M
                   2500000

                   2000000

                   1500000

                   1000000

                    500000                   280 K
                               10 K
                         0
                             EPO(Sling)   Chemicalize.org    IBM     Thomson
                                                                    Thompson
                                                                     Pharma
Patent & Literature Sources in
                    PubChem !
                                                      The	
  Big	
  Three	
  
 Thomson Pharma,!                                                                                            ChEMBL + !
patents and literature !                                                                                 PubMed + Journals!
     3,756,283!                                                                                               918,077!
   41% lead-like!                                                                                           45% lead-like!
                                   3,291,940	
   281,920	
                        515,745	
  

                                                           52,975	
  

                                             129,448	
                   67,437	
  


                                                           2,113,169	
  




                  IBM,	
  	
  pre-­‐2000	
  patents	
  	
  	
  2,369,481	
  	
  	
  	
  32%	
  lead-­‐like	
  	
  
SureChem to Deposit All Structures*
      into PubChem - 2012!




• 1976 to present
• Deposition of structures only
• View related patents in SureChemOpen
• *Some filtering of common chemistry likely
SureChem and IBM in PubChem 

             (2 Example Patents)!
SureChem Total: 776! IBM Total : 527!
                                          US583593, Inhibitors of squalene
                                               synthetase and protein
                                            farnesyltransferase. Abbott !


   478	
       298	
     229	
          SureChem Total: 832 ! IBM Total: 239!




                                               686	
     146	
      93	
  
         WO-1994018188-A1 !
 4-hydroxy-benzopyran-2-ones and 4-
  hydroxy-cycloalkyl[b]pyran-2-ones
    HIV protease inhibitors, Upjohn!
Identifying Relevant Chemistry - IC50!
    US-20120035195-A1 BACE2, Hoffman LaRoche
Structures with IC50 Values!
         US-20120035195-A1




PDF       SureChemOpen       Excel
Search IC50 Structures in PubChem!

              search
SureChem Unique Contribution!


                SureChem
                                               Pubchem
                    79              96      (ThomsonPharma ,
                                               Chemicalize)




 Stage!                             No. of Structures!
 Available from SureChem (SC)!      1848!
 Pre-Exist in PubChem!              669!
 Pre-Exist – not from IC50 table!   573!
 Pre-Exist – from IC50 table!       96 (12 from TP + 84 via chemicalize.org)!
 Unique-SC with IC50!               79!

 Unique-SC – beyond IC50 table!     1100!
Identifying Relevant Chemistry!


                                 Patent 

                                 US-20120035195-A1!




http://opentox.informatik.uni-
   freiburg.de/ches-mapper/!
SureChem Chemical Relevance Filtering!
•  Frequency	
  counts	
  of	
  chemicals	
  within	
  patents	
  
•  AddiHonal	
  molecular	
  property	
  filtering	
  i.e.	
  Lipinski	
  descriptors	
  
 !
•  Natural	
  Language	
  Processing	
  –	
  based	
  indexing	
  of	
  Exemplified	
  Compounds	
  
 !
 !               Automated indexing of Exemplified Compounds in text!
Conclusion!
SureChem deposition into PubChem will

  –  Significantly expand public patent chemistry scope
  –  Contribute unique and timely MedChem-relevant data
  –  Enable open drug discovery and chemical biology
  –  Advance progress toward a more open, federated
     chemical information network

Mais conteúdo relacionado

Destaque

Ll Ml 280 Pres
Ll Ml 280 PresLl Ml 280 Pres
Ll Ml 280 PresMatt Lee
 
Hanshi Ross
Hanshi RossHanshi Ross
Hanshi Rossfirencir
 
Curso Tenerife 2010
Curso Tenerife 2010Curso Tenerife 2010
Curso Tenerife 2010firencir
 
Wireless Cyber Warfare
Wireless Cyber WarfareWireless Cyber Warfare
Wireless Cyber Warfareideaflashed
 
Digital Forensic tools - Application Specific
Digital Forensic tools - Application SpecificDigital Forensic tools - Application Specific
Digital Forensic tools - Application Specificideaflashed
 

Destaque (11)

Ll Ml 280 Pres
Ll Ml 280 PresLl Ml 280 Pres
Ll Ml 280 Pres
 
Gelinas
GelinasGelinas
Gelinas
 
database.pdf
database.pdfdatabase.pdf
database.pdf
 
L.A.
L.A.L.A.
L.A.
 
Hanshi Ross
Hanshi RossHanshi Ross
Hanshi Ross
 
Curso Tenerife 2010
Curso Tenerife 2010Curso Tenerife 2010
Curso Tenerife 2010
 
Wireless Cyber Warfare
Wireless Cyber WarfareWireless Cyber Warfare
Wireless Cyber Warfare
 
Chapter 7
Chapter 7Chapter 7
Chapter 7
 
Smart Boards
Smart BoardsSmart Boards
Smart Boards
 
Digital Forensic tools - Application Specific
Digital Forensic tools - Application SpecificDigital Forensic tools - Application Specific
Digital Forensic tools - Application Specific
 
Cyber Warfare -
Cyber Warfare -Cyber Warfare -
Cyber Warfare -
 

Semelhante a SureChem - Integrating with public and proprietary data sources (ACS Fall 2012)

Integrating Patents with Research Data
Integrating Patents with Research DataIntegrating Patents with Research Data
Integrating Patents with Research DataChris Southan
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...Ken Karapetyan
 
Causes and consequences of automated extraction of patent-specified virtual d...
Causes and consequences of automated extraction of patent-specified virtual d...Causes and consequences of automated extraction of patent-specified virtual d...
Causes and consequences of automated extraction of patent-specified virtual d...Chris Southan
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...Dr. Haxel Consult
 
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and CaveatsThe Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and CaveatsChris Southan
 
Patent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEsPatent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEsChris Southan
 
The open patent chemistry “big bang”: Implications, opportunities and caveats
The open patent chemistry “big bang”: Implications, opportunities and caveatsThe open patent chemistry “big bang”: Implications, opportunities and caveats
The open patent chemistry “big bang”: Implications, opportunities and caveatsDr. Haxel Consult
 

Semelhante a SureChem - Integrating with public and proprietary data sources (ACS Fall 2012) (20)

Integrating Patents with Research Data
Integrating Patents with Research DataIntegrating Patents with Research Data
Integrating Patents with Research Data
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
The importance of the InChI identifier as a foundation technology for eScienc...
The importance of the InChI identifier as a foundation technology for eScienc...The importance of the InChI identifier as a foundation technology for eScienc...
The importance of the InChI identifier as a foundation technology for eScienc...
 
Patents in PubChem
Patents in PubChemPatents in PubChem
Patents in PubChem
 
Causes and consequences of automated extraction of patent-specified virtual d...
Causes and consequences of automated extraction of patent-specified virtual d...Causes and consequences of automated extraction of patent-specified virtual d...
Causes and consequences of automated extraction of patent-specified virtual d...
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 
AZ of Chemspider February 2011
AZ of Chemspider February 2011AZ of Chemspider February 2011
AZ of Chemspider February 2011
 
ChemSpider – The Vision and Challenges Associated with Building a Free Online...
ChemSpider – The Vision and Challenges Associated with Building a Free Online...ChemSpider – The Vision and Challenges Associated with Building a Free Online...
ChemSpider – The Vision and Challenges Associated with Building a Free Online...
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
 
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
 
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and CaveatsThe Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
 
Patent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEsPatent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEs
 
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
 
Overview of SureChEMBL
Overview of SureChEMBLOverview of SureChEMBL
Overview of SureChEMBL
 
The open patent chemistry “big bang”: Implications, opportunities and caveats
The open patent chemistry “big bang”: Implications, opportunities and caveatsThe open patent chemistry “big bang”: Implications, opportunities and caveats
The open patent chemistry “big bang”: Implications, opportunities and caveats
 
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
 
Ch08 massspec
Ch08 massspecCh08 massspec
Ch08 massspec
 
Bioalgo 2012-03-massspec
Bioalgo 2012-03-massspecBioalgo 2012-03-massspec
Bioalgo 2012-03-massspec
 

Último

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 

Último (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 

SureChem - Integrating with public and proprietary data sources (ACS Fall 2012)

  • 1. Integrating patent chemistry with public and private non-patent research resources   Nicko Goncharoff ACS Fall 2012 Andrew Hinton, PhD 19 August Christopher Southan, PhD
  • 2.
  • 3.
  • 4. SureChem Data Collection! Database of automatically mined structure data from text and images! ! • 20M annotated US, EP, WO full text records and Japan patent abstracts! I! • 12M unique chemical structures! • MEDLINE – 19M abstracts (coming Q4)!
  • 5. ª  Free resource for researchers! ª  Professional search needs! ª  Enables linking to public and ª  Data export, alerts, patent family proprietary content search, chemical relevance filters…! ª  API or Data Feed access to chemistry & full text! ª  Integrate with internal databases & workflows
  • 8. Current Patent Sources In PubChem! 4000000 3.7 M 3500000 3000000 Numbers of SID's 2.3 M 2500000 2000000 1500000 1000000 500000 280 K 10 K 0 EPO(Sling) Chemicalize.org IBM Thomson Thompson Pharma
  • 9. Patent & Literature Sources in PubChem ! The  Big  Three   Thomson Pharma,! ChEMBL + ! patents and literature ! PubMed + Journals! 3,756,283! 918,077! 41% lead-like! 45% lead-like! 3,291,940   281,920   515,745   52,975   129,448   67,437   2,113,169   IBM,    pre-­‐2000  patents      2,369,481        32%  lead-­‐like    
  • 10. SureChem to Deposit All Structures* into PubChem - 2012! • 1976 to present • Deposition of structures only • View related patents in SureChemOpen • *Some filtering of common chemistry likely
  • 11. SureChem and IBM in PubChem 
 (2 Example Patents)! SureChem Total: 776! IBM Total : 527! US583593, Inhibitors of squalene synthetase and protein farnesyltransferase. Abbott ! 478   298   229   SureChem Total: 832 ! IBM Total: 239! 686   146   93   WO-1994018188-A1 ! 4-hydroxy-benzopyran-2-ones and 4- hydroxy-cycloalkyl[b]pyran-2-ones HIV protease inhibitors, Upjohn!
  • 12. Identifying Relevant Chemistry - IC50! US-20120035195-A1 BACE2, Hoffman LaRoche
  • 13. Structures with IC50 Values! US-20120035195-A1 PDF SureChemOpen Excel
  • 14. Search IC50 Structures in PubChem! search
  • 15. SureChem Unique Contribution! SureChem Pubchem 79 96 (ThomsonPharma , Chemicalize) Stage! No. of Structures! Available from SureChem (SC)! 1848! Pre-Exist in PubChem! 669! Pre-Exist – not from IC50 table! 573! Pre-Exist – from IC50 table! 96 (12 from TP + 84 via chemicalize.org)! Unique-SC with IC50! 79! Unique-SC – beyond IC50 table! 1100!
  • 16. Identifying Relevant Chemistry! Patent 
 US-20120035195-A1! http://opentox.informatik.uni- freiburg.de/ches-mapper/!
  • 17. SureChem Chemical Relevance Filtering! •  Frequency  counts  of  chemicals  within  patents   •  AddiHonal  molecular  property  filtering  i.e.  Lipinski  descriptors   ! •  Natural  Language  Processing  –  based  indexing  of  Exemplified  Compounds   ! ! Automated indexing of Exemplified Compounds in text!
  • 18. Conclusion! SureChem deposition into PubChem will –  Significantly expand public patent chemistry scope –  Contribute unique and timely MedChem-relevant data –  Enable open drug discovery and chemical biology –  Advance progress toward a more open, federated chemical information network