OpenTox Euro 2013 poster
http://www.opentox.org/meet/opentoxeu2013/opentoxeu2013posters/
6. Ambit-TAUTOMER – a Software Tool for Automatic Tautomer Generation, Nina Jeliazkova (Ideaconsult Ltd)
Ambit-Tautomer [1] is an open source Java library for automatic generation of all tautomers of a given chemical compound. It is implemented on top of the Chemistry Development Kit (CDK) [2]. The system includes three main algorithms: pure combinatorial method, improved combinatorial method and incremental algorithm. The tautomer generator uses a set of predefined, but customizable rules. The rules are defined by Daylight SMILES/SMARTS line notations and support the basic types of tautomerism (1-3, 1-5 and 1-7 proton tautomer shifts). The pure combinatorial method generates all tautomeric forms considering all possible combinations of the matched rule states. The improved combinatorial method uses sub-combinations based on rules clustering. The incremental algorithm applies depth-first search to handle sophisticated cases of overlapping rules. Additionally, rule pre-filtering and tautomer post-filtering are applied for fine tuning of the generation process. The tautomer generator implements tautomer ranking based on empirical rules defined in terms of relative energy difference. Ambit-Tautomer library is applied to improve the Ambit database storage of chemical structures and accordingly to implement search procedures which take into account the tautomerism information. Also the tautomer sets are used to calculate modified values of the original molecular descriptors in order to improve existing QSAR/QSPR models. Ambit-Tautomer module is implemented as open source Java package as part of the Ambit open source software for chemoinformatics and data management [3,4] and is available as a Java library, command line application [5] and OpenTox Algorithm API compatible Web service [6]. Ambit package is available as online web services and as a downloadable application. A web page providing online tautomer generation by Ambit-Tautomer and several different software packages is available on http://apps.ideaconsult.net:8080/ambit2/depict/tautomer.
References
[1] Kochev, N. T., Paskaleva, V. H. and Jeliazkova, N., Ambit-Tautomer: An Open Source Tool for Tautomer Generation. Mol. Inf., 32: 481–504, 2013
[2] C. Steinbeck, Y. Han, S. Kuhn, O. Horlacher, E. Luttmann, E. Willighagen, The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics, J. Chem. Inf. Comput. Sci., 43: 493–500, 2003
[3] Jeliazkova N., Jeliazkov V. AMBIT RESTful web services: an implementation of the OpenTox application programming interface, Journal of Cheminformatics 2011, 3:18, doi:10.1186/1758-2946-3-18.
[4] http://ambit.sourceforge.net
[5] https://github.com/ideaconsult/examples-ambit/tree/master/tautomers-example
[6] http://apps.ideaconsult.net:8080/ambit2/algorithm/tautomers
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
ALGORITHMS FOR AUTOMATIC TAUTOMER GENERATION AND THEIR APPLICATIONS
1. ALGORITHMS FOR AUTOMATIC TAUTOMER GENERATION AND THEIR APPLICATIONS
Nikolay T. Kochev1, Vesselina H. Paskaleva1, Nina Jeliazkova2
of Plovdiv, Department of Analytical Chemistry and Computer Chemistry;
2Ideaconsult Ltd, 4 A. Kanchev str., Sofia 1000, Bulgaria
Ambit-Tautomer Basic Features
Tautomer Generation Flow Chart
Structure input
OC(O)=C(N)C
Customizable set of rules
• Basic set of 1-3 and 1-5 proton shift
•CDK.sf.net based structure
rules
representation,
• Additional rules: 1-7 proton shifts,
input, output and info processing
chlorine atom shifts
•Supports standard chemical formats:
• Rule description based on SMARTS
SMILES, InChI, MOL/SDF file, CML
• Exhaustive tautomer generation
Tautomer generation algorithms
• Customizable set of rules and postgeneration filters
• Pure combinatorial algorithm
• Incremental approach (based on depth
• Set of predefined rules
first search algorithm) for rule
• Tautomer ranking based on simple
combination with local rule corrections
empirical rules
and refinement on the way
CH3
(CDK representation)
0
↔
1
OH
00
each tautomer is described
as a binary combination
HN
O
HN
HO
CH3
11
OH
O
H2 N
NH2
HO
10
N=CC
01
XLogP
(no tautomers)
NH2 HO
CH3 HO
HO
XLogP (all tautomers)
mean error
1.90
1.70
1.50
1.30
1.10
0.90
0.70
2 ÷ 10
11 ÷ 30
31 ÷ 50
52 ÷ 100
102 ÷ 192
204 ÷ 292
302 ÷ 1318
1
0
0
1
1
1
1
1
1
1
1
1
1
1
1
Number of PaDEL descriptors
that have RSD > RSDthreshold
0.1
0.3
0.5
1.0
180
124
99
71
pemoline
0.1
0.3
0.5
1.0
217
151
108
80
239
168
138
113
Ambit-Tautomer [1] is part of the Ambit2 software package [2],
distributed under LGPL license and using the Chemistry
Development Kit (CDK) library [3] for basic chemoinformatics
functionality. Ambit-Tautomer utilizes a depth-first search
algorithm, combined with a set of rules for tautomeric
transformations.The Ambit implementation of OpenTox Web [4]
services for predictive toxicology, are being extended to include
the tautomer generation algorithm. A web page, providing online
tautomer generation by several different algorithms, including
Ambit-Tautomer, is available at:
http://apps.ideaconsult.net:8080/ambit2/depict/tautomer.
4
3
4
HO 2
NH
HO
4
1
3
5 CH
3
HO 2
at
4 3 1 HO
OC=C
at
213
OC=C
5 CH
2
at
0.135
-0.086
0.267
0.041
0.361
-0.102
-0.084
1.230
0.698
0.363
-0.277
-1.056
-0.932
-1.038
-1.267
unused rules
4
0
1
unused rules
3
431
at
013
5 CH
3 OC=C at 2 1 3
NC=C
NH2
at
OC=C
5 CH
3
0
NH2 NC=C
3
3
O 2
5 CH
3
HO 2
013
NH2
4
1
HO
0
NH2
5 CH
3
used rules
NC=C
at
431
N=CC
at
at
435
NC=C
at
435
431
at
213
431
N=CC
at
O=CC
used rules
O
0
4
HO 2
used rules
NH2
0
4
1
3
HO
3
HO 2
5 CH
3
NH2
5 CH
3
used rules
NC=C
at
431
NC=C
at
431
OC=C
at
213
OC=C
at
213
O=CC
at
013
OC=C
at
013
HO
Post-generation
filtering
duplicates, topological
HO
equivalency, allene
Ranking
atoms,
incorrect structures, …
NH2
CH3
HO
HO
Result
output
NH
CH3
O
NH2 HO
NH2
HO
CH3 HO
CH2
QSAR/QSPR Cheminfo Processing Flow Chart
methimazole
CDK
Connection generate
representation
2D
Table
(CDK container)
Structure input:
C1=CN(C(N1)=S)C
/SMILES, InChI, *.mol, CML/
tautomer 3D models
S
S
N
N
N
Z=32
W=40
ATSc1 = 0.14
…
H3 C
N
NH
generate tautomers
generate
3D
Calculate 1D, 2D, 3D
molecular descriptors
NA = 13
NH = 6
MW = 114.03
…
S
S
S
SH
H3 C
N
NH
H3 C
H3 C
N
N
N
N
Calculate fingerprints
(bit-vectors)
Group counts,
additive schemes
10001...111011
hashed fingerprint
0 0 1 0 1 . . . 0 0 1 0 1 0 key-based fingerprint
QSPR
QSAR
Similarity
search
Chemical
Data base
Models of biological
activities:
ADME
Toxicity, Mutagenicity,
Biodegradation, …
Models of
physicochemical
properties:
QSAR
LogP, BP, MP,
MR,…
QSPR
List of most similar structures
CH3
CH3
N
CH3
N
SH
N
N
NH
H3 C
N
N
N
compounds (subset of PubChem data base).
S
H3 C
N
SH
NH
(methimazole)
CH3
N
N
H3 C
N
S
N
0.62
CH3
H3 C
0.1
0.3
0.5
1.0
0
1
NH
3
HO 2
used rules
Table 1. The similarity search results for the three
tautomers of methimazole. Each column contains the
five most similar structures to the tautomer. Similarity
search is performed in a data base with 553477
1.
violuric acid
435
0
Generation of all
possible
combinations of
the rule states
based on Depthfirst search with
refinement of the
rule list at each
step.
Similarity
methimazole
RSD
threshold
at
5 CH
3
1
CH3
Similarity
Structure
1
3
HO
N=CC
Violuric acid tautomers Ames Mutagenicity
XLogP
/SMILES notations/
(model)
Table 3. The number of descriptors (out of total 863) which
exhibit relative standard deviation (RSD due to the
tautomerism) larger than particular thresholds: 0.1, 0.3, 0.5, 1.0
4
0
used rules
Initial rule list
N
O=C1NC(=O)C(=NO)C(=O)N1
O=C1N=C(O)N=C(O)C1(=NO)
O=C1N=C(O)C(=NO)C(O)=N1
The structural information was
O=C1N=C(O)C(=NO)C(=O)N1
processed according to the
O=C1N=C(O)NC(=O)C1(=NO)
presented flow chart. We
O=NC1=C(O)N=C(O)N=C1(O)
studied the influence of
O=NC=1C(=O)NC(O)=NC=1(O)
tautomers information on
O=NC=1C(=O)N=C(O)NC=1(O)
various processing stages:
O=NC=1C(O)=NC(=O)NC=1(O)
descriptor calculation (table 3),
O=NC=1C(=O)NC(=O)NC=1(O)
similarity searching (see table
O=NC1C(O)=NC(=O)N=C1(O)
1) and QSAR/QSPR modeling
O=NC1C(=O)N=C(O)N=C1(O)
of Ames-Mutagenicity and
O=NC1C(=O)NC(=O)N=C1(O)
LogP (see fig.2 and table 2).
O=NC1C(=O)N=C(O)NC1(=O)
O=NC1C(=O)NC(=O)NC1(=O)
431
1
CH3
Number of tautomers per structure
Table 2. The values of Amesmutagenecity model and
XLogP model for all tautomers
of viuoluric acid.
HO
at
HO 2
Figure 2. The mean
absolute errors for XLogP
model compared with the
errors obtained from the
averaged model values
calculated for all tautomers
for each testing structure.
The statistics is calculated
for 8327 test structures.
2.10
HO
NH2
HO
NH
CH2 HO
4 3 1 HO 2
marks the current rule used
to generate two possible
states
used rules
Substructure
search
- simple combinations do
not work
- rule conflicts are
possible
- some tautomers might
be omitted
- more sophisticated
approach is needed
NH2 HO
O
at
1
NH2
H3 C
Similarity
0
213
N
H2 N
↔
N=CC
at
unused rules
Combinations of non-overlapping rules
1
used rules
013
NC=C
HO
at
OC=C
NH2
HO
HO
Overlapping
rules
HO
4
0
OC=C
N
Software characteristics
unused rules
S
1University
N
N
0.71
0.47
CH3
CH3
S
N
H3 C
N
N
CH3
N
H
NH2
CH3
2.
CH3
N
0.6
CH2
N
N
H3 C
0.71
CH3
0.45
CH3
S
N
I–
H3 C
+
NH2
N
H
N
H3 C
3.
0.59
N
0.64
SH
+
Ag
HN
N
C-
0.44
CH3
H
N
N
NH
Figure 1. AMBIT2 Tautomer generation
test page
CH3
H3 C
S
4.
0.58
CH2
N
H3 C
Cl–
0.57
CH3
N
0.44
S
N
N
N
N+
H3 C
5.
0.54
CH3
0.57
H3 C
N
HN
H3 C
H
N
H
N
S
N–
N
CH3
0.43
References
[1] Kochev, N. T., Paskaleva, V. H. and Jeliazkova, N., Ambit-Tautomer: An Open Source Tool for
Tautomer Generation. Mol. Inf., 32: 481–504, 2013
[2] AMBIT project, http://ambit.sourceforge.net
[3] Steinbeck C., Hoppe C., Kuhn S., Guha R., Willighagen E.L., “Recent Developments of the
Chemistry Development Kit (CDK) – An Open-Source Java Library for Chemo- and Bioinformatics”.
Curr. Pharm. Des. 2006; 12(17):2111-2120 (DOI: 10.2174/138161206777585274)
[4] Jeliazkova N., Jeliazkov V., AMBIT RESTful web services: an implementation of the Open Tox
application programming interface, Journal of Chemoinformatics 2011, 3:18, doi: 10.1186/17582946-3-18.;