SlideShare uma empresa Scribd logo
1 de 61
Baixar para ler offline
TREC 2010 
Chemical IR Workshop 
Rajarshi Guha 
NIH Chemical Genomics Center 
November 2010 
NIST, Gaithersburg 
Acknowledgements 
•  John Barnard 
•  Joseph Scheiber 
•  Daniel Lowe 
•  David Wild 
•  Nina Jeliazkova 
Outline 
•  Chemical structure representaLon(s) 
•  Processing chemistry‐related documents 
•  Structure retrieval 
•  Chemical informaLon toolkits 
Chemical structure 
representaLon 
Based on material from I571, David Wild, Indiana University 
What to Include? 
•  The representaLon depends on what you 
want to include (and defines what you can 
include) 
– Atoms 
– ConnecLvity 
– Stereochemistry 
– Charges/Isotopes 
– 3D configuraLon 
C8H9NO3
Visual RepresentaLons 
•  2D structure diagram is the lingua franca of 
bench chemists 
•  Display is supported by nearly every 
cheminformaLcs system 
•  Summarizes 
–  ConnecLvity 
–  Stereochemistry 
–  Charge/isotope 
•  We can use 2D representaLons as the starLng 
point for many cheminformaLcs tasks 
Chemical Names 
•  Most papers and documents referring to 
chemistry will name molecules 
•  Two forms of chemical names 
– Trivial (short, pronounceable, uninformaLve) 
– SystemaLc (long, unpronounceable, can usually 
get structure back from the name)  
Tyrosine 
or 
β‐(p‐hydroxyphenyl)alanine or 
α‐amino‐p‐hydroxyhydrocinnamic acid or 
2‐amino‐3‐(4‐hydroxyphenyl) propanoic acid 
Chemical Numbers 
•  Also termed registry numbers 
•  Arbitrary numbers assigned to one (or more) 
structures 
•  SomeLmes a hierarchy might be present 
– Parent compound – stereoisomer – salt … 
•  CAS numbers / PubChem SID & CID / InChI Key 
•  Only way to get back structure is lookup 
Structures are Graphs (mostly) 
•  For many cases, a chemical structure can be 
considered as a graph 
– Atoms are nodes, bonds are edges 
– IdenLcal graphs imply idenLcal molecules 
•  But there are limits to this representaLon 
– Polymers, inorganic compounds, stereochemistry 
•  And chemical phenomena can create 
problems for a graph theoreLcal approach 
AromaLcity & Graphs 
•  These two molecules are idenLcal 
•  Yet their molecular graphs would 
suggest a different connecLvity 
•  In fact, all atoms and bonds in  
benzene are equivalent due to resonance 
•  In this case, we would (should) perceive each 
C‐C bond as aromaLc  
1D RepresentaLons 
•  1D representaLons are linear strings 
•  Generally only encode connecLvity, atom and 
bond type 
– Wiswesser line notaLon (WLN) 
– Sybyl line notaLon (SLN) 
– SMILES 
– InChI 
2D/3D RepresentaLons 
•  MulL‐line text formats 
•  Contain connecLvity, atom and bond types, 3D 
coordinates as well as other (possibly 
arbitrary) informaLon 
– MDL MOL format 
– PDB 
– Hyperchem HIN 
– Chemical Markup Language (CML) 
•  A simple line notaLon, preey much the lingua 
franca of cheminformaLcs 
•  Atoms are represented by  
their symbols 
–  Lower case indicates aromaLc 
atom 
•  Single bonds are implicit, double bonds are “=“, 
triple bonds are “#” and aromaLc are “:” 
•  Rings indicate by “ring closure numbers” 
–  In C1CC1, the two carbons marked by “1” are 
connected 
SMILES 
Canonical SMILES 
•  Given a structure, you can write a SMILES 
representaLon in mulLple ways 
  CC(C)CC 
  CCC(C)C 
•  As a result, comparing molecules or 
searching for molecules based on arbitrary 
SMILES can give misleading or wrong results 
•  To avoid this we canonicalize SMILES 
Canonical SMILES 
•  Given two structures that have idenLcal 
atoms, bonds and connecLvity their canonical 
SMILES will be idenLcal 
•  In general, any permutaLon of atom index will 
not affect the canonical SMILES 
•  CanonicalizaLon is a key feature of structure 
registraLon  
– You want to be sure that you have a single, unique 
representaLon of each structure in the database 
GeneraLng Canonical SMILES 
•  All toolkits are capable of generaLng these 
•  Using OpenBabel at the command line is easy 
•  Can convert lots of other file formats to 
canonical SMILES 
•  Note that different products have different 
canonicalizaLon algorithms 
$ echo "c1(O)ccccc1" | /usr/local/openbabel/bin/babel -ismi -ocan –c	
Oc1ccccc1	
$ echo "c1(ccccc1)O" | /usr/local/openbabel/bin/babel -ismi -ocan –c	
Oc1ccccc1
Tautomerism 
•  Simply generaLng a unique string 
representaLon of a molecule may not be 
sufficient to uniquely idenLfy that molecule 
•  Tautomerism is a reacLon involving the 
movement of a H, resulLng in a change of 
bond order 
Tautomerism 
•  Due to the rapid equilibrium the molecule 
exists in both forms – which one do we store? 
– Consider the most stable tautomer 
– Just go with a ‘canonical’ tautomer 
•  In general, you need to generate the canonical 
SMILES for a canonical tautomer 
•  Using InChI, you can choose whether to 
consider tautomers or not or just ignore 
tautomer informaLon 
Tautomerism & InChI 
hep://www.chemspider.com/blog/does‐inchi‐account‐for‐tautomers.html 
With mobile‐H percepLon on 
With mobile‐H percepLon off 
Markush Structures 
•  Compact representaLon of a set or class of specific 
compounds with common structural features 
•  Used in  
–  chemical patents 
–  query structures in substructure search systems 
–  QuanLtaLve Structure‐AcLvity RelaLonship (QSAR) 
analysis 
•  class of related compounds with acLvity data  
–  combinatorial libraries 
•  rapid synthesis of large numbers of related compounds 
–  legislaLon (controlled drugs, chemical weapons) 
Markush Structures 
R1 
R2 
Markush Structures 
•  S‐variaLon (subsLtuent variaLon) 
– List of alternaLve values for an R‐group 
•  P‐variaLon (posiLon variaLon) 
– Variable point of aeachment 
•  F‐variaLon (frequency variaLon) 
– MulLple occurrence of groups 
•  H‐variaLon (homology variaLon) 
– Generically described group (e.g., alkyl) 
– A (possibly) infinite set of S‐variaLons 
Markush Structures 
•  S‐variaLon – R1 is methyl or ethyl 
•  H‐variaLon‐ R2 is alkyl 
•  P‐variaLon – R3 is amino 
•  F‐variaLon – 3 (but more generally 
can be any number, say m) 
Markush Structures  
•  Can be considered as formal “grammar” for 
generaLng valid molecules (“sentences”) 
•  EnumeraLon of coverage usually impracLcal 
and open impossible (infinite sets) 
•  Appropriate algorithms for handling take 
advantage of Markush representaLon 
– Avoid enumeraLon (especially infinite sets) 
– Compare finite grammars rather than infinite sets 
of valid sentences 
The Markush Problem  
•  RepresentaLon 
– Mixture of structures and text 
– Generic expressions (viz., h‐variaLon) 
– Vagueness (“ … where by X we mean …”) 
•  Searching 
– TranslaLon problem – specific groups (ethy, butyl) 
must be matched against an expression (1‐6C alkyl) 
– SegmentaLon problem – boundaries between R‐
groups and the scaffold may not coincide 
Markush TranslaLon/SegmentaLon 
Chemical Similarity 
•  When are two molecules similar? 
– They are both 6‐membered rings 
– Both have carbons 
– Both have only single bonds 
•  In the second case, both have 
a N 
•  Many ways to define similarity 
•  CriLcally dependent on representaLon 
Why Chemical Similarity? 
•  Much of medicinal chemistry is based on the 
similarity principle 
– Similar molecules exhibit similar acLviLes 
– J. Med. Chem., 2002, 45, 4350-4358 
•  But there are many excepLons 
•  Even then, looking for similar molecules gives 
us a useful starLng point in many cases 
Fingerprint Based Similarity 
•  We can represent a chemical structure using a 
bit string representaLon 
•  The example is a “key” fingerprint 
– Each bit posiLon corresponds to  
a specific structural 
feature 
1 0 1 1 0 0 0 1 0
Fingerprint Based Similarity 
•  The similarity between two molecules is then 
defined in terms of the similarity between 
their fingerprints 
– Tanimoto, Dice, Cosine, Tversky 
•  Clearly, depending on the nature of the 
fingerprints, two molecules can be more or 
less similar 
Fingerprint Similarity 
•  InformaLon loss – fragments 
presence and absence instead of 
counts 
•  Bit string saturaLon – within a 
large database almost all bits are 
set 
•  The average similarity appears to 
increase with the complexity of 
the query compound   
•  Larger queries are more 
discriminaLng (flaeer curve, 
Tanimoto values spread wider) 
•  Smaller queries have sharp peak, 
unable to disLnguish between 
molecules Flower D., On the Properties of Bit String-
Based Measures of Chemical Similarity, J. Chem.
Inf. Comput. Sci., Vol. 38, No. 3, 1998
The distribution of Tanimoto values
found in database searches with a
range of query molecules
Nina Jeliazkova, hep://vedina.users.sourceforge.net/publicaLons/2005/ChemicalSimilarity.ppt 
Physical Similarity  
•  Keyed fingerprints are inevitably lossy 
•  Hashed and circular fingerprints can be beeer 
•  But in the end they both ignore the 3D 
structure of a molecules 
•  Shape and surface‐property based similariLes 
can be more relevant 
– Slower to evaluate 
– OpenEye ROCS is a tool to evaluate shape 
similarity 
Structural Similarity & 3D Property 
VariaLon 
Nina Jeliazkova, hep://vedina.users.sourceforge.net/publicaLons/2005/ChemicalSimilarity.ppt 
Similarity Indices 
Association indices Correlation indices
J. D. Holliday, C-Y. Hu† and P. Willett,(2002) Grouping of Coefficients for the Calculation of Inter-
Molecular Similarity and Dissimilarity using 2D Fragment Bit-Strings, Combinatorial Chemistry & High
Throughput Screening,5, 155-166 155
Nina Jeliazkova, hep://vedina.users.sourceforge.net/publicaLons/2005/ChemicalSimilarity.ppt 
Structure retrieval 
What Are We Asking For? 
Exact Matches?  Similar topology? 
Substructure 
matches? 
Similar properLes? 
What do 
we have? 
Caveat 
•  If structures are not registered properly, results 
of queries can be misleading, incomplete or 
wrong 
•  At the very least 
– Remove salts 
– Generate canonical tautomers 
– Create a canonical SMILES or InChI 
Exact Structure Retrieval 
Q Give a structure X, does the database have 
any instances of X? 
•  The trivial way to do this is via string matching 
•  Could match on canonical SMILES 
•  But then you have to ensure that the query 
and the database employ the same 
canonicalizer 
Exact Structure Retrieval 
Q Give a structure X, does the database have 
any instances of X? 
•  The trivial way to do this is via string matching 
•  Beeer to use  a hash code such as InChI 
•  Need to be careful that you and the 
database are using the same seungs (i.e., 
standard InChI) 
Exact Structure Retrieval 
•  This type of query is generally only useful 
during database registraLon 
•  Is also handy when trying to match up one 
collecLon against and another 
•  But can trip you up 
– Database stores stereochemistry, your query has 
none 
– You won’t find a match, if you use a full InChI 
– Similar problems with tautomerism 
Similar Structure Retrieval 
Q Give a structure X, does the database have 
any molecules similar to X? 
•  This can open up a can of worms! 
•  Most common case is to find structurally 
similar molecules in terms of 2D 
•  However, one can also consider 3D structural 
(i.e., shape) similarity 
•  Finally, one could also idenLfy similar 
molecules based on similar physicochemical 
properLes 
Property Similarity 
•  Each molecule can be represented as an N‐
dimensional vector of numbers 
– These can represent structural descriptors 
(number of rings, graph invariants) 
– Physical characterisLcs (log P, polar surface area) 
•  Similarity is then defined in terms of the 
distance between the descriptor vectors of the 
query molecule & the target molecules 
Substructure Retrieval 
Q Give a structure X, does the database have 
any molecules that contain X? 
•  This is basically subgraph isomorphism 
•  There are a number of variaLons 
•  Find an exact substructure 
•  Fuzzy substructure (e.g., ignore atom type) 
•  Maximum common substructure 
Substructure Retrieval 
•  The basic subgraph isomorphism algorithms 
are quite well known 
•  All cheminformaLcs toolkits support this 
•  In the simplest approach, we can specify a 
SMILES string as a query 
– c1ccccc1C(=O) as a query looks for molecules 
containing a benzaldehyde moiety 
Generic Substructure Queries 
•  Using a SMILES as a query implies that you 
look for a specific substructure 
•  What about finding molecules containing an 
aromaLc ring connected to carbonyl via a N or 
a C? 
•  Valid molecules would be 
•  We can perform these  
queries using SMARTS 
SMARTS Queries 
•  Regular expressions for chemical structures 
•  The previous query is achieved by   
                 c1ccccc1[c,C,n,N]C=O	
•  Very powerful system and fundamental to 
many cheminformaLcs methods 
•  Also see SMARTSViewer to visualize SMARTS 
queries and the Daylight matcher to test them 
Substructure Retrieval Performance? 
•  Naively, SS queries require one to check each 
entry in a database 
•  But can be significantly sped up by making 
sure that the target molecule has all the 
features of the query molecule 
                         can never match 
– Can do this by comparing fingerprints, which is 
much faster than doing graph isomorphism 
Maximum Common Substructure 
•  Largest subgraph common to two structures 
•  NP‐complete problem 
•  Many approaches try to 
idenLfy an approximate 
soluLon 
•  MulLple MCS can be used to idenLfy core 
scaffolds 
3D (sub) Graph Isomorphism 
•  Also known as shape matching (if we consider 
whole molecules) or pharmacophore 
searching (if we consider substructures) 
•  EssenLally idenLfy molecules that contain a 
3D geometric moLf 
– The moLf is defined in terms of atoms/groups and 
distance/angle/dihedral constraints amongst them 
3D (sub) Graph Isomorphism 
Processing chemistry‐related 
documents 
Parsing Chemical Documents 
•  I have minimal experLse in this area and more 
of a user than a developer of such tools 
•  Two step process 
– Chemical enLty extracLon (from text or images) 
– EnLty (i.e., name) to structure conversion 
•  Variety of tools for both steps 
Parsing Chemical Documents 
Text en;ty recogni;on  Image recogni;on 
(a)  Extractors (IUPAC names) 
‐ TEMIS Chemical EnLty RelaLonships 
Skill Cartridge 
‐ Accelrys Pipeline Pilot extractor 
(NoLora) 
‐ Fraunhofer (ProMiner Chemistry) 
‐ Chemaxon (chemicalize.org) 
‐ Oscar (Corbee, Murray‐Rust et al.) 
‐ SureChem 
‐ IBM ChemFrag Annotator 
(b)  Converter  
‐ CambridgeSop name=struct 
‐ Openeye Lexichem 
‐ Chemaxon 
‐  OSRA (NIH) 
‐  Clide Pro (Keymodule Ltd.) 
‐  Fraunhofer chemoCR 
‐  ChemReader 
hep://www.chemaxon.com/library/user‐presentaLons/chemical‐enLty‐extracLon‐using‐the‐chemicalize‐org‐technology/ 
EnLty ExtracLon 
•  Many tools use some form of dicLonary 
lookup (PubChem or ChEBI is a good source of 
chemical terms) 
•  But dicLonaries are certainly not sufficient 
Daniel Lowe, 239th ACS NaLonal MeeLng 
OPSIN 
OPSIN 
•  Can be used as a library or as a command line 
tool 
•  Outputs CML by default, but this can easily be 
converted to other formats by CDK or 
OpenBabel 
•  See here for some benchmarks of OPSIN 
versus Lexichem and ChemAxon 
OPSIN Example 
package gov.nih.ncgc;	
import nu.xom.Element;	
import uk.ac.cam.ch.wwmm.opsin.NameToStructure;	
import uk.ac.cam.ch.wwmm.opsin.NameToStructureException;	
import java.io.BufferedReader;	
import java.io.File;	
import java.io.FileReader;	
import java.io.IOException;	
public class OpsinExample {	
String filename;	
public OpsinExample(String filename) throws NameToStructureException, IOException {	
this.filename = filename;	
}	
public void run() throws NameToStructureException, IOException {	
NameToStructure nameToStructure = NameToStructure.getInstance();	
BufferedReader reader = new BufferedReader(new FileReader(new File(filename)));	
String line;	
while ((line = reader.readLine()) != null) {	
Element cmlElement = nameToStructure.parseToCML(line);	
System.out.println(cmlElement.toXML());	
}	
}	
public static void main(String[] args) throws IOException, NameToStructureException {	
OpsinExample oe = new OpsinExample("/Users/guhar/Documents/Presentations/trec/sw/chemnames.txt");	
oe.run();	
}	
}
Lexichem Example 
from openeye.oechem import *	
from openeye.oeiupac import *	
mol = OEGraphMol()	
for line in open('chemnames.txt'):	
line = line.strip()	
mol.Clear()	
OEParseIUPACName(mol,line)	
print OECreateCanSmiString(mol)
Chemical informaLon toolkits 
Sopware Tools 
•  Chemical InformaLon Toolkits 
– CDK (Java, LGPL) 
– OpenBabel (C++, GPL2) 
– RDKit (C++, BSD) 
– Indigo (C++, Dual licensed) 
– JChem (Java, commercial, free for academics) 
– OEChem (C++, commercial, free for academics) 
– Daylight (C, commercial) 
Sopware Tools 
•  Name to Structure applicaLons 
– OSCAR3 (Java, LGPL) consisLng of OPSIN and 
ChemTok 
– ChemAxon (Java, commercial, free for academics) 
– LexiChem (C++, commercial, free for academics) 

Mais conteúdo relacionado

Semelhante a TREC2010 Chemical IR Workshop

Digitally enabling the RSC archive
Digitally enabling the RSC archiveDigitally enabling the RSC archive
Digitally enabling the RSC archiveKen Karapetyan
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Agepetermurrayrust
 
Chem4Word Wade
Chem4Word WadeChem4Word Wade
Chem4Word WadeAlex Wade
 
2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...Michel Dumontier
 
2009 08 13 - Clinical LOINC Tutorial - Patient Assessment Instruments
2009 08 13 - Clinical LOINC Tutorial - Patient Assessment Instruments2009 08 13 - Clinical LOINC Tutorial - Patient Assessment Instruments
2009 08 13 - Clinical LOINC Tutorial - Patient Assessment Instrumentsdvreeman
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectKen Karapetyan
 
Modtrove and the role of electronic notebooks
Modtrove and the role of electronic notebooksModtrove and the role of electronic notebooks
Modtrove and the role of electronic notebooksmiiker
 
Overview of cheminformatics
Overview of cheminformaticsOverview of cheminformatics
Overview of cheminformaticsBenjamin Bucior
 
2010 07 15 - Clinical LOINC Tutorial - Patient Assessment Instruments
2010 07 15 - Clinical LOINC Tutorial - Patient Assessment Instruments2010 07 15 - Clinical LOINC Tutorial - Patient Assessment Instruments
2010 07 15 - Clinical LOINC Tutorial - Patient Assessment Instrumentsdvreeman
 
Research Objects in Scientific Publications
Research Objects in Scientific PublicationsResearch Objects in Scientific Publications
Research Objects in Scientific Publicationsdgarijo
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Valery Tkachenko
 
Dr. Scott Webster Resume 11 4 11
Dr. Scott Webster   Resume   11 4 11Dr. Scott Webster   Resume   11 4 11
Dr. Scott Webster Resume 11 4 11Scott_Webster
 
Getting the Measure of Analytics: Using bibliometrics and usage statistics to...
Getting the Measure of Analytics: Using bibliometrics and usage statistics to...Getting the Measure of Analytics: Using bibliometrics and usage statistics to...
Getting the Measure of Analytics: Using bibliometrics and usage statistics to...Fintan Bracken
 
Dr. Scott Webster Resume - 2-27-2019
Dr. Scott Webster   Resume - 2-27-2019Dr. Scott Webster   Resume - 2-27-2019
Dr. Scott Webster Resume - 2-27-2019Scott Webster
 

Semelhante a TREC2010 Chemical IR Workshop (20)

Digitally enabling the RSC archive
Digitally enabling the RSC archiveDigitally enabling the RSC archive
Digitally enabling the RSC archive
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
Digitally enabling the RSC archive
Digitally enabling the RSC archiveDigitally enabling the RSC archive
Digitally enabling the RSC archive
 
Chem4Word Wade
Chem4Word WadeChem4Word Wade
Chem4Word Wade
 
2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...
 
2009 08 13 - Clinical LOINC Tutorial - Patient Assessment Instruments
2009 08 13 - Clinical LOINC Tutorial - Patient Assessment Instruments2009 08 13 - Clinical LOINC Tutorial - Patient Assessment Instruments
2009 08 13 - Clinical LOINC Tutorial - Patient Assessment Instruments
 
Ontology work at the Royal Society of Chemistry
Ontology work at the Royal Society of ChemistryOntology work at the Royal Society of Chemistry
Ontology work at the Royal Society of Chemistry
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
Jon Batchelder c.v.
Jon Batchelder c.v.Jon Batchelder c.v.
Jon Batchelder c.v.
 
Modtrove and the role of electronic notebooks
Modtrove and the role of electronic notebooksModtrove and the role of electronic notebooks
Modtrove and the role of electronic notebooks
 
Overview of cheminformatics
Overview of cheminformaticsOverview of cheminformatics
Overview of cheminformatics
 
2010 07 15 - Clinical LOINC Tutorial - Patient Assessment Instruments
2010 07 15 - Clinical LOINC Tutorial - Patient Assessment Instruments2010 07 15 - Clinical LOINC Tutorial - Patient Assessment Instruments
2010 07 15 - Clinical LOINC Tutorial - Patient Assessment Instruments
 
Research Objects in Scientific Publications
Research Objects in Scientific PublicationsResearch Objects in Scientific Publications
Research Objects in Scientific Publications
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...
 
Dr. Scott Webster Resume 11 4 11
Dr. Scott Webster   Resume   11 4 11Dr. Scott Webster   Resume   11 4 11
Dr. Scott Webster Resume 11 4 11
 
Getting the Measure of Analytics: Using bibliometrics and usage statistics to...
Getting the Measure of Analytics: Using bibliometrics and usage statistics to...Getting the Measure of Analytics: Using bibliometrics and usage statistics to...
Getting the Measure of Analytics: Using bibliometrics and usage statistics to...
 
Fintan Bracken & Arlene Healy 'Getting the measure of analytics'
Fintan Bracken & Arlene Healy 'Getting the measure of analytics'Fintan Bracken & Arlene Healy 'Getting the measure of analytics'
Fintan Bracken & Arlene Healy 'Getting the measure of analytics'
 
Dr. Scott Webster Resume - 2-27-2019
Dr. Scott Webster   Resume - 2-27-2019Dr. Scott Webster   Resume - 2-27-2019
Dr. Scott Webster Resume - 2-27-2019
 

Mais de Rajarshi Guha

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomeRajarshi Guha
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in contextRajarshi Guha
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomeRajarshi Guha
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMCRajarshi Guha
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsRajarshi Guha
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATSRajarshi Guha
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & RRajarshi Guha
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical StructuresRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Rajarshi Guha
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the partsRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Rajarshi Guha
 
Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesRajarshi Guha
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Rajarshi Guha
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research DatabaseRajarshi Guha
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsRajarshi Guha
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleRajarshi Guha
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Rajarshi Guha
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in RRajarshi Guha
 
PMML for QSAR Model Exchange
PMML for QSAR Model Exchange PMML for QSAR Model Exchange
PMML for QSAR Model Exchange Rajarshi Guha
 

Mais de Rajarshi Guha (20)

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark Genome
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in context
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark Genome
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMC
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network Models
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & R
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the parts
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
 
Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the Pipes
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research Database
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of Cheminformatics
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & Reproducible
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in R
 
PMML for QSAR Model Exchange
PMML for QSAR Model Exchange PMML for QSAR Model Exchange
PMML for QSAR Model Exchange
 
Smashing Molecules
Smashing MoleculesSmashing Molecules
Smashing Molecules
 

TREC2010 Chemical IR Workshop