SlideShare uma empresa Scribd logo
1 de 1
Baixar para ler offline
ALGORITHMS FOR AUTOMATIC TAUTOMER GENERATION AND THEIR APPLICATIONS
Nikolay T. Kochev1, Vesselina H. Paskaleva1, Nina Jeliazkova2
of Plovdiv, Department of Analytical Chemistry and Computer Chemistry;
2Ideaconsult Ltd, 4 A. Kanchev str., Sofia 1000, Bulgaria

Ambit-Tautomer Basic Features

Tautomer Generation Flow Chart
Structure input
OC(O)=C(N)C

Customizable set of rules

• Basic set of 1-3 and 1-5 proton shift
•CDK.sf.net based structure
rules
representation,
• Additional rules: 1-7 proton shifts,
input, output and info processing
chlorine atom shifts
•Supports standard chemical formats:
• Rule description based on SMARTS
SMILES, InChI, MOL/SDF file, CML
• Exhaustive tautomer generation
Tautomer generation algorithms
• Customizable set of rules and postgeneration filters
• Pure combinatorial algorithm
• Incremental approach (based on depth
• Set of predefined rules
first search algorithm) for rule
• Tautomer ranking based on simple
combination with local rule corrections
empirical rules
and refinement on the way

CH3

(CDK representation)

0

↔

1
OH

00

each tautomer is described
as a binary combination
HN

O

HN

HO

CH3

11

OH
O

H2 N

NH2

HO

10

N=CC

01

XLogP
(no tautomers)

NH2 HO

CH3 HO

HO

XLogP (all tautomers)

mean error

1.90
1.70
1.50
1.30
1.10
0.90
0.70
2 ÷ 10

11 ÷ 30

31 ÷ 50

52 ÷ 100

102 ÷ 192

204 ÷ 292

302 ÷ 1318

1
0
0
1
1
1
1
1
1
1
1
1
1
1
1

Number of PaDEL descriptors
that have RSD > RSDthreshold

0.1
0.3
0.5
1.0

180
124
99
71

pemoline

0.1
0.3
0.5
1.0

217
151
108
80
239
168
138
113

Ambit-Tautomer [1] is part of the Ambit2 software package [2],
distributed under LGPL license and using the Chemistry
Development Kit (CDK) library [3] for basic chemoinformatics
functionality. Ambit-Tautomer utilizes a depth-first search
algorithm, combined with a set of rules for tautomeric
transformations.The Ambit implementation of OpenTox Web [4]
services for predictive toxicology, are being extended to include
the tautomer generation algorithm. A web page, providing online
tautomer generation by several different algorithms, including
Ambit-Tautomer, is available at:
http://apps.ideaconsult.net:8080/ambit2/depict/tautomer.

4

3

4

HO 2

NH

HO

4

1

3
5 CH
3

HO 2

at

4 3 1 HO

OC=C

at

213

OC=C

5 CH
2

at

0.135
-0.086
0.267
0.041
0.361
-0.102
-0.084
1.230
0.698
0.363
-0.277
-1.056
-0.932
-1.038
-1.267

unused rules

4

0
1

unused rules

3

431

at

013

5 CH
3 OC=C at 2 1 3

NC=C
NH2

at

OC=C

5 CH
3

0

NH2 NC=C

3

3

O 2

5 CH
3

HO 2

013

NH2

4

1

HO

0

NH2

5 CH
3

used rules
NC=C

at

431

N=CC

at

at

435

NC=C

at

435

431

at

213

431

N=CC

at

O=CC

used rules

O

0

4

HO 2
used rules

NH2

0

4

1

3

HO

3

HO 2

5 CH
3

NH2

5 CH
3

used rules

NC=C

at

431

NC=C

at

431

OC=C

at

213

OC=C

at

213

O=CC

at

013

OC=C

at

013

HO
Post-generation
filtering
duplicates, topological
HO
equivalency, allene
Ranking
atoms,
incorrect structures, …

NH2

CH3

HO

HO

Result
output

NH

CH3

O

NH2 HO

NH2

HO

CH3 HO

CH2

QSAR/QSPR Cheminfo Processing Flow Chart
methimazole
CDK

Connection generate
representation
2D
Table
(CDK container)

Structure input:
C1=CN(C(N1)=S)C
/SMILES, InChI, *.mol, CML/

tautomer 3D models
S

S
N

N

N

Z=32
W=40
ATSc1 = 0.14
…

H3 C

N

NH

generate tautomers
generate
3D

Calculate 1D, 2D, 3D
molecular descriptors
NA = 13
NH = 6
MW = 114.03
…

S

S

S
SH

H3 C

N

NH

H3 C

H3 C
N

N

N

N

Calculate fingerprints
(bit-vectors)

Group counts,
additive schemes

10001...111011
hashed fingerprint
0 0 1 0 1 . . . 0 0 1 0 1 0 key-based fingerprint

QSPR

QSAR

Similarity
search

Chemical
Data base
Models of biological
activities:
ADME
Toxicity, Mutagenicity,
Biodegradation, …

Models of
physicochemical
properties:

QSAR

LogP, BP, MP,
MR,…

QSPR

List of most similar structures
CH3

CH3

N

CH3

N

SH

N

N

NH

H3 C

N

N

N

compounds (subset of PubChem data base).
S
H3 C

N

SH

NH

(methimazole)
CH3
N

N

H3 C

N

S

N

0.62

CH3

H3 C

0.1
0.3
0.5
1.0

0
1

NH

3

HO 2

used rules

Table 1. The similarity search results for the three
tautomers of methimazole. Each column contains the
five most similar structures to the tautomer. Similarity
search is performed in a data base with 553477

1.

violuric acid

435

0

Generation of all
possible
combinations of
the rule states
based on Depthfirst search with
refinement of the
rule list at each
step.

Similarity

methimazole

RSD
threshold

at

5 CH
3

1

CH3

Similarity

Structure

1

3

HO

N=CC

Violuric acid tautomers Ames Mutagenicity
XLogP
/SMILES notations/
(model)

Table 3. The number of descriptors (out of total 863) which
exhibit relative standard deviation (RSD due to the
tautomerism) larger than particular thresholds: 0.1, 0.3, 0.5, 1.0

4

0

used rules

Initial rule list

N

O=C1NC(=O)C(=NO)C(=O)N1
O=C1N=C(O)N=C(O)C1(=NO)
O=C1N=C(O)C(=NO)C(O)=N1
The structural information was
O=C1N=C(O)C(=NO)C(=O)N1
processed according to the
O=C1N=C(O)NC(=O)C1(=NO)
presented flow chart. We
O=NC1=C(O)N=C(O)N=C1(O)
studied the influence of
O=NC=1C(=O)NC(O)=NC=1(O)
tautomers information on
O=NC=1C(=O)N=C(O)NC=1(O)
various processing stages:
O=NC=1C(O)=NC(=O)NC=1(O)
descriptor calculation (table 3),
O=NC=1C(=O)NC(=O)NC=1(O)
similarity searching (see table
O=NC1C(O)=NC(=O)N=C1(O)
1) and QSAR/QSPR modeling
O=NC1C(=O)N=C(O)N=C1(O)
of Ames-Mutagenicity and
O=NC1C(=O)NC(=O)N=C1(O)
LogP (see fig.2 and table 2).
O=NC1C(=O)N=C(O)NC1(=O)
O=NC1C(=O)NC(=O)NC1(=O)

431

1

CH3

Number of tautomers per structure
Table 2. The values of Amesmutagenecity model and
XLogP model for all tautomers
of viuoluric acid.

HO

at

HO 2

Figure 2. The mean
absolute errors for XLogP
model compared with the
errors obtained from the
averaged model values
calculated for all tautomers
for each testing structure.
The statistics is calculated
for 8327 test structures.

2.10

HO

NH2

HO

NH

CH2 HO

4 3 1 HO 2

marks the current rule used
to generate two possible
states

used rules

Substructure
search

- simple combinations do
not work
- rule conflicts are
possible
- some tautomers might
be omitted
- more sophisticated
approach is needed

NH2 HO

O

at

1

NH2

H3 C

Similarity

0

213

N

H2 N

↔

N=CC

at

unused rules

Combinations of non-overlapping rules

1

used rules

013

NC=C

HO

at

OC=C

NH2

HO

HO

Overlapping
rules

HO

4

0

OC=C

N

Software characteristics

unused rules

S

1University

N

N

0.71

0.47

CH3

CH3

S

N

H3 C

N

N

CH3

N
H

NH2

CH3

2.

CH3

N

0.6

CH2
N

N

H3 C

0.71

CH3

0.45
CH3

S

N

I–

H3 C

+

NH2

N
H

N

H3 C

3.

0.59

N

0.64

SH

+

Ag
HN

N

C-

0.44

CH3
H
N

N

NH

Figure 1. AMBIT2 Tautomer generation
test page

CH3

H3 C
S

4.

0.58

CH2
N

H3 C

Cl–

0.57

CH3
N

0.44
S

N

N

N

N+
H3 C

5.

0.54

CH3

0.57

H3 C
N

HN

H3 C

H
N

H
N

S

N–
N

CH3

0.43

References
[1] Kochev, N. T., Paskaleva, V. H. and Jeliazkova, N., Ambit-Tautomer: An Open Source Tool for
Tautomer Generation. Mol. Inf., 32: 481–504, 2013
[2] AMBIT project, http://ambit.sourceforge.net
[3] Steinbeck C., Hoppe C., Kuhn S., Guha R., Willighagen E.L., “Recent Developments of the
Chemistry Development Kit (CDK) – An Open-Source Java Library for Chemo- and Bioinformatics”.
Curr. Pharm. Des. 2006; 12(17):2111-2120 (DOI: 10.2174/138161206777585274)
[4] Jeliazkova N., Jeliazkov V., AMBIT RESTful web services: an implementation of the Open Tox
application programming interface, Journal of Chemoinformatics 2011, 3:18, doi: 10.1186/17582946-3-18.;

Mais conteúdo relacionado

Último

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 

Último (20)

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 

Destaque

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destaque (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

ALGORITHMS FOR AUTOMATIC TAUTOMER GENERATION AND THEIR APPLICATIONS

  • 1. ALGORITHMS FOR AUTOMATIC TAUTOMER GENERATION AND THEIR APPLICATIONS Nikolay T. Kochev1, Vesselina H. Paskaleva1, Nina Jeliazkova2 of Plovdiv, Department of Analytical Chemistry and Computer Chemistry; 2Ideaconsult Ltd, 4 A. Kanchev str., Sofia 1000, Bulgaria Ambit-Tautomer Basic Features Tautomer Generation Flow Chart Structure input OC(O)=C(N)C Customizable set of rules • Basic set of 1-3 and 1-5 proton shift •CDK.sf.net based structure rules representation, • Additional rules: 1-7 proton shifts, input, output and info processing chlorine atom shifts •Supports standard chemical formats: • Rule description based on SMARTS SMILES, InChI, MOL/SDF file, CML • Exhaustive tautomer generation Tautomer generation algorithms • Customizable set of rules and postgeneration filters • Pure combinatorial algorithm • Incremental approach (based on depth • Set of predefined rules first search algorithm) for rule • Tautomer ranking based on simple combination with local rule corrections empirical rules and refinement on the way CH3 (CDK representation) 0 ↔ 1 OH 00 each tautomer is described as a binary combination HN O HN HO CH3 11 OH O H2 N NH2 HO 10 N=CC 01 XLogP (no tautomers) NH2 HO CH3 HO HO XLogP (all tautomers) mean error 1.90 1.70 1.50 1.30 1.10 0.90 0.70 2 ÷ 10 11 ÷ 30 31 ÷ 50 52 ÷ 100 102 ÷ 192 204 ÷ 292 302 ÷ 1318 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 Number of PaDEL descriptors that have RSD > RSDthreshold 0.1 0.3 0.5 1.0 180 124 99 71 pemoline 0.1 0.3 0.5 1.0 217 151 108 80 239 168 138 113 Ambit-Tautomer [1] is part of the Ambit2 software package [2], distributed under LGPL license and using the Chemistry Development Kit (CDK) library [3] for basic chemoinformatics functionality. Ambit-Tautomer utilizes a depth-first search algorithm, combined with a set of rules for tautomeric transformations.The Ambit implementation of OpenTox Web [4] services for predictive toxicology, are being extended to include the tautomer generation algorithm. A web page, providing online tautomer generation by several different algorithms, including Ambit-Tautomer, is available at: http://apps.ideaconsult.net:8080/ambit2/depict/tautomer. 4 3 4 HO 2 NH HO 4 1 3 5 CH 3 HO 2 at 4 3 1 HO OC=C at 213 OC=C 5 CH 2 at 0.135 -0.086 0.267 0.041 0.361 -0.102 -0.084 1.230 0.698 0.363 -0.277 -1.056 -0.932 -1.038 -1.267 unused rules 4 0 1 unused rules 3 431 at 013 5 CH 3 OC=C at 2 1 3 NC=C NH2 at OC=C 5 CH 3 0 NH2 NC=C 3 3 O 2 5 CH 3 HO 2 013 NH2 4 1 HO 0 NH2 5 CH 3 used rules NC=C at 431 N=CC at at 435 NC=C at 435 431 at 213 431 N=CC at O=CC used rules O 0 4 HO 2 used rules NH2 0 4 1 3 HO 3 HO 2 5 CH 3 NH2 5 CH 3 used rules NC=C at 431 NC=C at 431 OC=C at 213 OC=C at 213 O=CC at 013 OC=C at 013 HO Post-generation filtering duplicates, topological HO equivalency, allene Ranking atoms, incorrect structures, … NH2 CH3 HO HO Result output NH CH3 O NH2 HO NH2 HO CH3 HO CH2 QSAR/QSPR Cheminfo Processing Flow Chart methimazole CDK Connection generate representation 2D Table (CDK container) Structure input: C1=CN(C(N1)=S)C /SMILES, InChI, *.mol, CML/ tautomer 3D models S S N N N Z=32 W=40 ATSc1 = 0.14 … H3 C N NH generate tautomers generate 3D Calculate 1D, 2D, 3D molecular descriptors NA = 13 NH = 6 MW = 114.03 … S S S SH H3 C N NH H3 C H3 C N N N N Calculate fingerprints (bit-vectors) Group counts, additive schemes 10001...111011 hashed fingerprint 0 0 1 0 1 . . . 0 0 1 0 1 0 key-based fingerprint QSPR QSAR Similarity search Chemical Data base Models of biological activities: ADME Toxicity, Mutagenicity, Biodegradation, … Models of physicochemical properties: QSAR LogP, BP, MP, MR,… QSPR List of most similar structures CH3 CH3 N CH3 N SH N N NH H3 C N N N compounds (subset of PubChem data base). S H3 C N SH NH (methimazole) CH3 N N H3 C N S N 0.62 CH3 H3 C 0.1 0.3 0.5 1.0 0 1 NH 3 HO 2 used rules Table 1. The similarity search results for the three tautomers of methimazole. Each column contains the five most similar structures to the tautomer. Similarity search is performed in a data base with 553477 1. violuric acid 435 0 Generation of all possible combinations of the rule states based on Depthfirst search with refinement of the rule list at each step. Similarity methimazole RSD threshold at 5 CH 3 1 CH3 Similarity Structure 1 3 HO N=CC Violuric acid tautomers Ames Mutagenicity XLogP /SMILES notations/ (model) Table 3. The number of descriptors (out of total 863) which exhibit relative standard deviation (RSD due to the tautomerism) larger than particular thresholds: 0.1, 0.3, 0.5, 1.0 4 0 used rules Initial rule list N O=C1NC(=O)C(=NO)C(=O)N1 O=C1N=C(O)N=C(O)C1(=NO) O=C1N=C(O)C(=NO)C(O)=N1 The structural information was O=C1N=C(O)C(=NO)C(=O)N1 processed according to the O=C1N=C(O)NC(=O)C1(=NO) presented flow chart. We O=NC1=C(O)N=C(O)N=C1(O) studied the influence of O=NC=1C(=O)NC(O)=NC=1(O) tautomers information on O=NC=1C(=O)N=C(O)NC=1(O) various processing stages: O=NC=1C(O)=NC(=O)NC=1(O) descriptor calculation (table 3), O=NC=1C(=O)NC(=O)NC=1(O) similarity searching (see table O=NC1C(O)=NC(=O)N=C1(O) 1) and QSAR/QSPR modeling O=NC1C(=O)N=C(O)N=C1(O) of Ames-Mutagenicity and O=NC1C(=O)NC(=O)N=C1(O) LogP (see fig.2 and table 2). O=NC1C(=O)N=C(O)NC1(=O) O=NC1C(=O)NC(=O)NC1(=O) 431 1 CH3 Number of tautomers per structure Table 2. The values of Amesmutagenecity model and XLogP model for all tautomers of viuoluric acid. HO at HO 2 Figure 2. The mean absolute errors for XLogP model compared with the errors obtained from the averaged model values calculated for all tautomers for each testing structure. The statistics is calculated for 8327 test structures. 2.10 HO NH2 HO NH CH2 HO 4 3 1 HO 2 marks the current rule used to generate two possible states used rules Substructure search - simple combinations do not work - rule conflicts are possible - some tautomers might be omitted - more sophisticated approach is needed NH2 HO O at 1 NH2 H3 C Similarity 0 213 N H2 N ↔ N=CC at unused rules Combinations of non-overlapping rules 1 used rules 013 NC=C HO at OC=C NH2 HO HO Overlapping rules HO 4 0 OC=C N Software characteristics unused rules S 1University N N 0.71 0.47 CH3 CH3 S N H3 C N N CH3 N H NH2 CH3 2. CH3 N 0.6 CH2 N N H3 C 0.71 CH3 0.45 CH3 S N I– H3 C + NH2 N H N H3 C 3. 0.59 N 0.64 SH + Ag HN N C- 0.44 CH3 H N N NH Figure 1. AMBIT2 Tautomer generation test page CH3 H3 C S 4. 0.58 CH2 N H3 C Cl– 0.57 CH3 N 0.44 S N N N N+ H3 C 5. 0.54 CH3 0.57 H3 C N HN H3 C H N H N S N– N CH3 0.43 References [1] Kochev, N. T., Paskaleva, V. H. and Jeliazkova, N., Ambit-Tautomer: An Open Source Tool for Tautomer Generation. Mol. Inf., 32: 481–504, 2013 [2] AMBIT project, http://ambit.sourceforge.net [3] Steinbeck C., Hoppe C., Kuhn S., Guha R., Willighagen E.L., “Recent Developments of the Chemistry Development Kit (CDK) – An Open-Source Java Library for Chemo- and Bioinformatics”. Curr. Pharm. Des. 2006; 12(17):2111-2120 (DOI: 10.2174/138161206777585274) [4] Jeliazkova N., Jeliazkov V., AMBIT RESTful web services: an implementation of the Open Tox application programming interface, Journal of Chemoinformatics 2011, 3:18, doi: 10.1186/17582946-3-18.;