31. Fingerprint Similarity
• InformaLon loss – fragments
presence and absence instead of
counts
• Bit string saturaLon – within a
large database almost all bits are
set
• The average similarity appears to
increase with the complexity of
the query compound
• Larger queries are more
discriminaLng (flaeer curve,
Tanimoto values spread wider)
• Smaller queries have sharp peak,
unable to disLnguish between
molecules Flower D., On the Properties of Bit String-
Based Measures of Chemical Similarity, J. Chem.
Inf. Comput. Sci., Vol. 38, No. 3, 1998
The distribution of Tanimoto values
found in database searches with a
range of query molecules
Nina Jeliazkova, hep://vedina.users.sourceforge.net/publicaLons/2005/ChemicalSimilarity.ppt
34. Similarity Indices
Association indices Correlation indices
J. D. Holliday, C-Y. Hu† and P. Willett,(2002) Grouping of Coefficients for the Calculation of Inter-
Molecular Similarity and Dissimilarity using 2D Fragment Bit-Strings, Combinatorial Chemistry & High
Throughput Screening,5, 155-166 155
Nina Jeliazkova, hep://vedina.users.sourceforge.net/publicaLons/2005/ChemicalSimilarity.ppt
57. OPSIN Example
package gov.nih.ncgc;
import nu.xom.Element;
import uk.ac.cam.ch.wwmm.opsin.NameToStructure;
import uk.ac.cam.ch.wwmm.opsin.NameToStructureException;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
public class OpsinExample {
String filename;
public OpsinExample(String filename) throws NameToStructureException, IOException {
this.filename = filename;
}
public void run() throws NameToStructureException, IOException {
NameToStructure nameToStructure = NameToStructure.getInstance();
BufferedReader reader = new BufferedReader(new FileReader(new File(filename)));
String line;
while ((line = reader.readLine()) != null) {
Element cmlElement = nameToStructure.parseToCML(line);
System.out.println(cmlElement.toXML());
}
}
public static void main(String[] args) throws IOException, NameToStructureException {
OpsinExample oe = new OpsinExample("/Users/guhar/Documents/Presentations/trec/sw/chemnames.txt");
oe.run();
}
}
58. Lexichem Example
from openeye.oechem import *
from openeye.oeiupac import *
mol = OEGraphMol()
for line in open('chemnames.txt'):
line = line.strip()
mol.Clear()
OEParseIUPACName(mol,line)
print OECreateCanSmiString(mol)