SlideShare uma empresa Scribd logo
1 de 23
Molecular File Formats
Types of File formats
Elsevier MDL supports a number of file formats for representation and
communication of chemical information.
Name Description
molfiles Each molfile describes a single molecular structure which can
contain disjoint fragments as salts .
SDfiles They are Structure-data files which contain data for any
number of molecules .SDfiles are the primary format for
large-scale data transfer between MDL databases.
RGfiles An RGfile describes a single molecular query with Rgroups.
Each RGfile is a combination of Ctabs defining the root
molecule and each member of each Rgroup in the query.
rxnfiles Reaction files.Eachrxnfile contains the structural information
for the reactants and products of a single reaction.
RDfiles Reaction Data File: RDfile is a more general format that can
include reactions as well as molecules.
File Formats
http://c4.cabrillo.edu/404/ctfile.pdf
Connection Table [Ctab]
A connection table (Ctab) contains information describing the structural
relationships and properties of a collection of atoms. The connection table is
fundamental to all of the MDL file formats.
9 9 0 0 0 0 0 0 0 0999 V2000 Countline
-1.0200 1.5300 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.5100 2.4100 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.5000 2.3900 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.0000 3.2700 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.0300 3.2700 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
0.5000 4.1500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 Atom Block
-0.5000 4.1500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.0100 3.2800 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.0300 3.2800 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0
2 8 1 0
2 3 2 3
3 4 1 0
4 5 2 0
4 6 1 0
6 7 2 3 Bonds Block
7 8 1 0
8 9 2 0
Ctab Features
Parts of Ctab Description
Counts Line Important specifications here relate to the number of
atoms, bonds, and atom lists, the chiral flag setting,
and the Ctab version.
Atom Block Specifies the atomic symbol and any mass difference,
charge, stereochemistry, and associated hydrogens for
each atom.
Bond Block Specifies the two atoms connected by the bond, the
bond type, and any bond stereochemistry and topology
(chain or ring properties) for each bond.
Properties Block Provides for future expandability of Ctab features,
while maintaining compatibility with earlier Ctab
configurations.
1. Counts Line
aaabbblllfffcccsssmmmvvvvvv
where
• aaa = number of atoms (current max 255)* [Generic]
• bbb = number of bonds (current max 255)* [Generic]
• lll = number of atom lists (max 30)* [Query]
• fff = (obsolete)
• ccc = chiral flag: 0=not chiral, 1=chiral [Generic]
• sss = number of stext entries [MDL ISIS/Desktop]
• Mmm = number of lines of additional properties, including the M END line.
no longer supported, the default is set to 999.[Generic]
shows six atoms, five bonds, the CHIRAL flag on, and three lines in the
properties block:
6 5 0 0 1 0 3 V2000
Shows 9 atoms, 9 bonds, the CHIRAL flag of
9 9 0 0 0 0 0 0 0 0999 V2000
2. Atom Block
The Atom Block is made up of atom lines, one line per atom with the
following format.
xxxxx.xxxxyyyyy.yyyyzzzzz.zzzzaaaddcccssshhhbbbvvvHHHrrriiimmmnnneee
Field Meaning Values
XYZ Atom coordinates
aaa atom symbol entry in periodic table or L for atom list, A, Q, * for unspecified
atom, and LP for lone pair, or R# for Rgroup label
dd Mass difference -3, -2, -1, 0, 1, 2, 3, 4 (0 if value beyond these limits)
ccc Charge 0 = uncharged or value other than these, 1 = +3, 2 = +2, 3 = +1,
4 = doublet radical, 5 = -1, 6 = -2, 7 = -3
sss atom stereo parity 0 = not stereo, 1 = odd, 2 = even, 3 = either or unmarked stereo
center.
hhh hydrogen count + 1 1 = H0, 2 = H1, 3 = H2, 4 = H3, 5 = H4
bbb stereo care box 0 = ignore stereo configuration of this double bond atom, 1 =
stereo configuration of double bond atom must match
vvv Valence 0 = no marking (default) (1 to 14) = (1 to 14) 15 = zero
valence.
HHH H0 designator 0 = not specified, 1 = no H atoms allowed
3.Bonds block
The Bond Block is made up of bond lines, one line per bond, with the following format:
111222tttsssxxxrrrccc
Field Meaning Values
111 First atom number 1 - number of atoms
222 Second atom number 1 - number of atoms
ttt Bond type 1 = Single, 2 = Double, 3 = Triple, 4 =
Aromatic, 5 = Single or Double, 6 = Single
or Aromatic, 7 = Double or Aromatic, 8 =
Any
sss bond stereo Single bonds: 0 = not stereo, 1 = Up, 4 =
Either, 6 = Down, Double bonds: 0 = Use
x-, y-, z-coords from atom block to
determine cis or trans, 3 = Cis or trans
(either) double bond.
rrr Bond topology 0 = Either, 1 = Ring, 2 = Chain
Mol File
A molfile consists of a header block and a connection table. The
following shows a molfile for alanine corresponding to the following
structure:x`
Identifies the molfile: molecule name,
user's name, program, date, and other
miscellaneous information and
comments
atom 4: charge +1
atom 6: charge -1
1 entry for an isotope
atom 3: mass=13
Representation of Stereochemistry
What is Stereochemistry ?
http://www.chemhelper.com/enantiomers.html
Representationof Stereochemistry: Atom Block
Representationof Stereochemistry: Bond Block
1= Shows stereo bond up
RGfiles
In RGfilesLines beginning with $ define the overall structure of the Rgroup query; the
molfile header block is embedded in the Rgroup header block.In addition to the
primary connection table (Ctab block) for the root structure, a Ctab block defines each
member (*m) within each Rgroup (*r).
Example of RGfile
SDfile
An SDfile (structure-data file) contains the structural information and associated data items for
one or more compounds.
*l is repeated for each line of data
*d is repeated for each data item
*c is repeated for each compound
Example of SDfile
RXNfile
Rxnfiles contain structural data for the reactants and
products of a reaction.
where:
*r is repeated for each reactant
*p is repeated for each product
RXNfile example
RDfiles
• An RD-File(reaction data file) consist of a set of edible “records”. Each record
defines a molecule or reaction, and its associated data.
• The [RDfile Header] must occur at the beginning of the physical file and
indentifies the file as an RDfile. A version stamp of 1 is given for future expansion
of the format.
• $DATM: Date/time (M/D/Y, c) stamp. This line is treated as a comment and
ignored when the program is read.
*d is repeated for each data item
*r is repeated for each reaction or molecule
RDfile example
Mol2 files from TRIPOS
Original from Tripos. Contains atom coordinates, bonds, substructure information.This
format supports partial charges and isotopes.
• Lines 1,2,3,5 and 6 are comments. They contain
the molecule name and information about the time
the molecule was created and last modified.
• Lines 8, 15, 28, and 41 in the example are Record
Type Indicator(RTIs). It is used to indicate the type
of data which follows in a .mol2 file.
• Lines 9-12, 16-27, 29-40, and 42 are all data
records
Parts of mol2 file
@<TRIPOS>MOLECULE
The first data line is the name of the molecule. The second data line contains the number of atoms, bonds,
substructures, features, and sets associated with the molecule. The third data line is the molecule type. The fourth data
line tells the type of charges associated with the molecule. The fifth data line contains the internal SYBYL status bits
associated with the molecule. The last data line contains any comment which may be associated with the molecule.
@<TRIPOS>ATOM
atom_id atom_name x y z atom_type [subst_id [subst_name [charge [status_bit]]]]
Example :
1 CA -0.149 0.299 0.000 C.3 1 ALA1 0.000 BACKBONE|DICT|DIRECT
In the example above the atom has ID number 1. It is named CA and is located at (-0.149, 0.299, 0.000). Its atom type is C.3. It
belongs to the substructure with ID 1 which is named ALA1. The charge associated with the atom is 0.000 and the SYBYL status
bits associated with the atom are
BACKBONE, DICT, and DIRECT.
@<TRIPOS>BOND
bond_id origin_atom_id target_atom_id bond_type [status_bits]
Example : 1 1 2 ar
Example bond shows, it has ID number 1 and connects atoms 1 and 2 .It is an aromatic bond.
@<TRIPOS>SUBSTRUCTURE
subst_id subst_name root_atom [subst_type [dict_type [chain [sub_type [inter_bonds [status [comment]]]]]]]
Example: 1 BENZENE1 PERM 0 **** ****** 0 ROOT
The substructure has 1 as ID BENZENE1 as name .It is a type of PERM and associated with dictionary type 0 . The SYBYL status
bits indicate it is the ROOT substructure.
References
• http://www.tripos.com/data/support/mol2.pdf
• http://accelrys.com/products/informatics/cheminformatics/ctfile-formats/no-fee.php
• Description of Several Chemical Structure File Formats Used by Computer Programs
Developed at Molecular Design Limited. Arthur Dalby etal. J. Chem. Inf Comput. Sci.
1992, 32, 244-255.
• http://www.chem.ucla.edu/harding/tutorials/stereochem/rsez.pdf
• http://www.chem.ucla.edu/harding/notes/notes_14C_stereo03.pdf

Mais conteúdo relacionado

Mais procurados

Pharmacophore mapping in Drug Development
Pharmacophore mapping in Drug DevelopmentPharmacophore mapping in Drug Development
Pharmacophore mapping in Drug DevelopmentMbachu Chinedu
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionArindam Ghosh
 
Med264 Structural Bioinformatics
Med264 Structural BioinformaticsMed264 Structural Bioinformatics
Med264 Structural BioinformaticsPhilip Bourne
 
Secondary structure prediction
Secondary structure predictionSecondary structure prediction
Secondary structure predictionsamantlalit
 
X ray crystellography
X ray crystellographyX ray crystellography
X ray crystellographyAashish Patel
 
Presentation on insilico drug design and virtual screening
Presentation on insilico drug design and virtual screeningPresentation on insilico drug design and virtual screening
Presentation on insilico drug design and virtual screeningJoon Jyoti Sahariah
 
Cheminformatics in drug design
Cheminformatics in drug designCheminformatics in drug design
Cheminformatics in drug designSurmil Shah
 
Lecture 9 molecular descriptors
Lecture 9  molecular descriptorsLecture 9  molecular descriptors
Lecture 9 molecular descriptorsRAJAN ROLTA
 
Homology modeling of proteins (ppt)
Homology modeling of proteins (ppt)Homology modeling of proteins (ppt)
Homology modeling of proteins (ppt)Melvin Alex
 
Protein micro array
Protein micro arrayProtein micro array
Protein micro arraykrupa sagar
 
Threading modeling methods
Threading modeling methodsThreading modeling methods
Threading modeling methodsratanvishwas
 
Molecular Mechanics in Molecular Modeling
Molecular Mechanics in Molecular ModelingMolecular Mechanics in Molecular Modeling
Molecular Mechanics in Molecular ModelingAkshay Kank
 

Mais procurados (20)

Pharmacophore mapping in Drug Development
Pharmacophore mapping in Drug DevelopmentPharmacophore mapping in Drug Development
Pharmacophore mapping in Drug Development
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure Prediction
 
Med264 Structural Bioinformatics
Med264 Structural BioinformaticsMed264 Structural Bioinformatics
Med264 Structural Bioinformatics
 
Secondary structure prediction
Secondary structure predictionSecondary structure prediction
Secondary structure prediction
 
X ray crystellography
X ray crystellographyX ray crystellography
X ray crystellography
 
Presentation on insilico drug design and virtual screening
Presentation on insilico drug design and virtual screeningPresentation on insilico drug design and virtual screening
Presentation on insilico drug design and virtual screening
 
Cheminformatics in drug design
Cheminformatics in drug designCheminformatics in drug design
Cheminformatics in drug design
 
Homology Modelling
Homology ModellingHomology Modelling
Homology Modelling
 
Lecture 9 molecular descriptors
Lecture 9  molecular descriptorsLecture 9  molecular descriptors
Lecture 9 molecular descriptors
 
Homology modeling of proteins (ppt)
Homology modeling of proteins (ppt)Homology modeling of proteins (ppt)
Homology modeling of proteins (ppt)
 
Cambridge structural database
Cambridge structural databaseCambridge structural database
Cambridge structural database
 
Protein Structure Prediction
Protein Structure PredictionProtein Structure Prediction
Protein Structure Prediction
 
Protein micro array
Protein micro arrayProtein micro array
Protein micro array
 
demonstration lecture on Homology modeling
demonstration lecture on Homology modelingdemonstration lecture on Homology modeling
demonstration lecture on Homology modeling
 
Threading modeling methods
Threading modeling methodsThreading modeling methods
Threading modeling methods
 
Transcriptomics approaches
Transcriptomics approachesTranscriptomics approaches
Transcriptomics approaches
 
Molecular Mechanics in Molecular Modeling
Molecular Mechanics in Molecular ModelingMolecular Mechanics in Molecular Modeling
Molecular Mechanics in Molecular Modeling
 
Homology modelling
Homology modellingHomology modelling
Homology modelling
 
ZINC database
ZINC databaseZINC database
ZINC database
 
Energy minimization
Energy minimizationEnergy minimization
Energy minimization
 

Destaque

BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES nadeem akhter
 
molecular file formats in bioinformatics
molecular file formats in bioinformaticsmolecular file formats in bioinformatics
molecular file formats in bioinformaticsnadeem akhter
 
Intro to Open Babel
Intro to Open BabelIntro to Open Babel
Intro to Open Babelbaoilleach
 
Computational biology bls 303
Computational biology bls 303Computational biology bls 303
Computational biology bls 303Bruno Mmassy
 
sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformaticsnadeem akhter
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformaticsnadeem akhter
 

Destaque (13)

Sequence file formats
Sequence file formatsSequence file formats
Sequence file formats
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES
 
Design your own test automation tool
Design your own test automation toolDesign your own test automation tool
Design your own test automation tool
 
molecular file formats in bioinformatics
molecular file formats in bioinformaticsmolecular file formats in bioinformatics
molecular file formats in bioinformatics
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Intro to Open Babel
Intro to Open BabelIntro to Open Babel
Intro to Open Babel
 
Computational biology bls 303
Computational biology bls 303Computational biology bls 303
Computational biology bls 303
 
sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformatics
 
Biological databases
Biological databasesBiological databases
Biological databases
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformatics
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Biological Databases
Biological DatabasesBiological Databases
Biological Databases
 
Biological databases
Biological databasesBiological databases
Biological databases
 

Semelhante a Chemical File Formats for storing chemical data

Bits protein structure
Bits protein structureBits protein structure
Bits protein structureBITS
 
Md simulations modified
Md simulations modifiedMd simulations modified
Md simulations modifiedshahmeermateen
 
Non-equilibrium molecular dynamics with LAMMPS
Non-equilibrium molecular dynamics with LAMMPSNon-equilibrium molecular dynamics with LAMMPS
Non-equilibrium molecular dynamics with LAMMPSAndrea Benassi
 
2.Electronic Structure
2.Electronic  Structure2.Electronic  Structure
2.Electronic StructureAlan Crooks
 
Cmc chapter 08
Cmc chapter 08Cmc chapter 08
Cmc chapter 08Jane Hamze
 
class8_handout_mtse_5010_2019.pdf
class8_handout_mtse_5010_2019.pdfclass8_handout_mtse_5010_2019.pdf
class8_handout_mtse_5010_2019.pdfSureshGoli2
 
Gutell 079.nar.2001.29.04724
Gutell 079.nar.2001.29.04724Gutell 079.nar.2001.29.04724
Gutell 079.nar.2001.29.04724Robin Gutell
 
Report on Work of Joint DCMI/IEEE LTSC Task Force
Report on Work of Joint DCMI/IEEE LTSC Task ForceReport on Work of Joint DCMI/IEEE LTSC Task Force
Report on Work of Joint DCMI/IEEE LTSC Task ForceEduserv Foundation
 
Oracle sql tutorial
Oracle sql tutorialOracle sql tutorial
Oracle sql tutorialMohd Tousif
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsNextMove Software
 
LIBRARY_information.pdf
LIBRARY_information.pdfLIBRARY_information.pdf
LIBRARY_information.pdfagnathavasi
 
Data Types - Premetive and Non Premetive
Data Types - Premetive and Non Premetive Data Types - Premetive and Non Premetive
Data Types - Premetive and Non Premetive Raj Naik
 

Semelhante a Chemical File Formats for storing chemical data (20)

Oct 2011 ualr
Oct 2011 ualrOct 2011 ualr
Oct 2011 ualr
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structure
 
ch3
ch3ch3
ch3
 
Md simulations modified
Md simulations modifiedMd simulations modified
Md simulations modified
 
Non-equilibrium molecular dynamics with LAMMPS
Non-equilibrium molecular dynamics with LAMMPSNon-equilibrium molecular dynamics with LAMMPS
Non-equilibrium molecular dynamics with LAMMPS
 
2.Electronic Structure
2.Electronic  Structure2.Electronic  Structure
2.Electronic Structure
 
Basic execution
Basic executionBasic execution
Basic execution
 
Cmc chapter 08
Cmc chapter 08Cmc chapter 08
Cmc chapter 08
 
class8_handout_mtse_5010_2019.pdf
class8_handout_mtse_5010_2019.pdfclass8_handout_mtse_5010_2019.pdf
class8_handout_mtse_5010_2019.pdf
 
RDKit Gems
RDKit GemsRDKit Gems
RDKit Gems
 
Dbms relational model
Dbms relational modelDbms relational model
Dbms relational model
 
Gutell 079.nar.2001.29.04724
Gutell 079.nar.2001.29.04724Gutell 079.nar.2001.29.04724
Gutell 079.nar.2001.29.04724
 
Report on Work of Joint DCMI/IEEE LTSC Task Force
Report on Work of Joint DCMI/IEEE LTSC Task ForceReport on Work of Joint DCMI/IEEE LTSC Task Force
Report on Work of Joint DCMI/IEEE LTSC Task Force
 
Oracle sql tutorial
Oracle sql tutorialOracle sql tutorial
Oracle sql tutorial
 
SQL
SQLSQL
SQL
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
 
LIBRARY_information.pdf
LIBRARY_information.pdfLIBRARY_information.pdf
LIBRARY_information.pdf
 
DBMS Unit-2.pdf
DBMS Unit-2.pdfDBMS Unit-2.pdf
DBMS Unit-2.pdf
 
Data Types - Premetive and Non Premetive
Data Types - Premetive and Non Premetive Data Types - Premetive and Non Premetive
Data Types - Premetive and Non Premetive
 
Soap win
Soap winSoap win
Soap win
 

Mais de Abhik Seal

Clinicaldataanalysis in r
Clinicaldataanalysis in rClinicaldataanalysis in r
Clinicaldataanalysis in rAbhik Seal
 
Virtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryVirtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryAbhik Seal
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on rAbhik Seal
 
Data handling in r
Data handling in rData handling in r
Data handling in rAbhik Seal
 
Modeling Chemical Datasets
Modeling Chemical DatasetsModeling Chemical Datasets
Modeling Chemical DatasetsAbhik Seal
 
Introduction to Adverse Drug Reactions
Introduction to Adverse Drug ReactionsIntroduction to Adverse Drug Reactions
Introduction to Adverse Drug ReactionsAbhik Seal
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to functionAbhik Seal
 
Sequencedatabases
SequencedatabasesSequencedatabases
SequencedatabasesAbhik Seal
 
Understanding Smiles
Understanding Smiles Understanding Smiles
Understanding Smiles Abhik Seal
 
Learning chemistry with google
Learning chemistry with googleLearning chemistry with google
Learning chemistry with googleAbhik Seal
 
3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using data3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using dataAbhik Seal
 
R scatter plots
R scatter plotsR scatter plots
R scatter plotsAbhik Seal
 
Q plot tutorial
Q plot tutorialQ plot tutorial
Q plot tutorialAbhik Seal
 
Pharmacohoreppt
PharmacohorepptPharmacohoreppt
PharmacohorepptAbhik Seal
 

Mais de Abhik Seal (20)

Chemical data
Chemical dataChemical data
Chemical data
 
Clinicaldataanalysis in r
Clinicaldataanalysis in rClinicaldataanalysis in r
Clinicaldataanalysis in r
 
Virtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryVirtual Screening in Drug Discovery
Virtual Screening in Drug Discovery
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
 
Data handling in r
Data handling in rData handling in r
Data handling in r
 
Networks
NetworksNetworks
Networks
 
Modeling Chemical Datasets
Modeling Chemical DatasetsModeling Chemical Datasets
Modeling Chemical Datasets
 
Introduction to Adverse Drug Reactions
Introduction to Adverse Drug ReactionsIntroduction to Adverse Drug Reactions
Introduction to Adverse Drug Reactions
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
 
Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
 
Understanding Smiles
Understanding Smiles Understanding Smiles
Understanding Smiles
 
Learning chemistry with google
Learning chemistry with googleLearning chemistry with google
Learning chemistry with google
 
3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using data3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using data
 
Poster
PosterPoster
Poster
 
R scatter plots
R scatter plotsR scatter plots
R scatter plots
 
Indo us 2012
Indo us 2012Indo us 2012
Indo us 2012
 
Q plot tutorial
Q plot tutorialQ plot tutorial
Q plot tutorial
 
Weka guide
Weka guideWeka guide
Weka guide
 
Pharmacohoreppt
PharmacohorepptPharmacohoreppt
Pharmacohoreppt
 
Document1
Document1Document1
Document1
 

Último

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Último (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Chemical File Formats for storing chemical data

  • 2. Types of File formats Elsevier MDL supports a number of file formats for representation and communication of chemical information. Name Description molfiles Each molfile describes a single molecular structure which can contain disjoint fragments as salts . SDfiles They are Structure-data files which contain data for any number of molecules .SDfiles are the primary format for large-scale data transfer between MDL databases. RGfiles An RGfile describes a single molecular query with Rgroups. Each RGfile is a combination of Ctabs defining the root molecule and each member of each Rgroup in the query. rxnfiles Reaction files.Eachrxnfile contains the structural information for the reactants and products of a single reaction. RDfiles Reaction Data File: RDfile is a more general format that can include reactions as well as molecules.
  • 4. Connection Table [Ctab] A connection table (Ctab) contains information describing the structural relationships and properties of a collection of atoms. The connection table is fundamental to all of the MDL file formats. 9 9 0 0 0 0 0 0 0 0999 V2000 Countline -1.0200 1.5300 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.5100 2.4100 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5000 2.3900 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.0000 3.2700 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0300 3.2700 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 0.5000 4.1500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 Atom Block -0.5000 4.1500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.0100 3.2800 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.0300 3.2800 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 2 8 1 0 2 3 2 3 3 4 1 0 4 5 2 0 4 6 1 0 6 7 2 3 Bonds Block 7 8 1 0 8 9 2 0
  • 5. Ctab Features Parts of Ctab Description Counts Line Important specifications here relate to the number of atoms, bonds, and atom lists, the chiral flag setting, and the Ctab version. Atom Block Specifies the atomic symbol and any mass difference, charge, stereochemistry, and associated hydrogens for each atom. Bond Block Specifies the two atoms connected by the bond, the bond type, and any bond stereochemistry and topology (chain or ring properties) for each bond. Properties Block Provides for future expandability of Ctab features, while maintaining compatibility with earlier Ctab configurations.
  • 6. 1. Counts Line aaabbblllfffcccsssmmmvvvvvv where • aaa = number of atoms (current max 255)* [Generic] • bbb = number of bonds (current max 255)* [Generic] • lll = number of atom lists (max 30)* [Query] • fff = (obsolete) • ccc = chiral flag: 0=not chiral, 1=chiral [Generic] • sss = number of stext entries [MDL ISIS/Desktop] • Mmm = number of lines of additional properties, including the M END line. no longer supported, the default is set to 999.[Generic] shows six atoms, five bonds, the CHIRAL flag on, and three lines in the properties block: 6 5 0 0 1 0 3 V2000 Shows 9 atoms, 9 bonds, the CHIRAL flag of 9 9 0 0 0 0 0 0 0 0999 V2000
  • 7. 2. Atom Block The Atom Block is made up of atom lines, one line per atom with the following format. xxxxx.xxxxyyyyy.yyyyzzzzz.zzzzaaaddcccssshhhbbbvvvHHHrrriiimmmnnneee Field Meaning Values XYZ Atom coordinates aaa atom symbol entry in periodic table or L for atom list, A, Q, * for unspecified atom, and LP for lone pair, or R# for Rgroup label dd Mass difference -3, -2, -1, 0, 1, 2, 3, 4 (0 if value beyond these limits) ccc Charge 0 = uncharged or value other than these, 1 = +3, 2 = +2, 3 = +1, 4 = doublet radical, 5 = -1, 6 = -2, 7 = -3 sss atom stereo parity 0 = not stereo, 1 = odd, 2 = even, 3 = either or unmarked stereo center. hhh hydrogen count + 1 1 = H0, 2 = H1, 3 = H2, 4 = H3, 5 = H4 bbb stereo care box 0 = ignore stereo configuration of this double bond atom, 1 = stereo configuration of double bond atom must match vvv Valence 0 = no marking (default) (1 to 14) = (1 to 14) 15 = zero valence. HHH H0 designator 0 = not specified, 1 = no H atoms allowed
  • 8. 3.Bonds block The Bond Block is made up of bond lines, one line per bond, with the following format: 111222tttsssxxxrrrccc Field Meaning Values 111 First atom number 1 - number of atoms 222 Second atom number 1 - number of atoms ttt Bond type 1 = Single, 2 = Double, 3 = Triple, 4 = Aromatic, 5 = Single or Double, 6 = Single or Aromatic, 7 = Double or Aromatic, 8 = Any sss bond stereo Single bonds: 0 = not stereo, 1 = Up, 4 = Either, 6 = Down, Double bonds: 0 = Use x-, y-, z-coords from atom block to determine cis or trans, 3 = Cis or trans (either) double bond. rrr Bond topology 0 = Either, 1 = Ring, 2 = Chain
  • 9. Mol File A molfile consists of a header block and a connection table. The following shows a molfile for alanine corresponding to the following structure:x` Identifies the molfile: molecule name, user's name, program, date, and other miscellaneous information and comments atom 4: charge +1 atom 6: charge -1 1 entry for an isotope atom 3: mass=13
  • 10. Representation of Stereochemistry What is Stereochemistry ? http://www.chemhelper.com/enantiomers.html
  • 12. Representationof Stereochemistry: Bond Block 1= Shows stereo bond up
  • 13. RGfiles In RGfilesLines beginning with $ define the overall structure of the Rgroup query; the molfile header block is embedded in the Rgroup header block.In addition to the primary connection table (Ctab block) for the root structure, a Ctab block defines each member (*m) within each Rgroup (*r).
  • 15. SDfile An SDfile (structure-data file) contains the structural information and associated data items for one or more compounds. *l is repeated for each line of data *d is repeated for each data item *c is repeated for each compound
  • 17. RXNfile Rxnfiles contain structural data for the reactants and products of a reaction. where: *r is repeated for each reactant *p is repeated for each product
  • 19. RDfiles • An RD-File(reaction data file) consist of a set of edible “records”. Each record defines a molecule or reaction, and its associated data. • The [RDfile Header] must occur at the beginning of the physical file and indentifies the file as an RDfile. A version stamp of 1 is given for future expansion of the format. • $DATM: Date/time (M/D/Y, c) stamp. This line is treated as a comment and ignored when the program is read. *d is repeated for each data item *r is repeated for each reaction or molecule
  • 21. Mol2 files from TRIPOS Original from Tripos. Contains atom coordinates, bonds, substructure information.This format supports partial charges and isotopes. • Lines 1,2,3,5 and 6 are comments. They contain the molecule name and information about the time the molecule was created and last modified. • Lines 8, 15, 28, and 41 in the example are Record Type Indicator(RTIs). It is used to indicate the type of data which follows in a .mol2 file. • Lines 9-12, 16-27, 29-40, and 42 are all data records
  • 22. Parts of mol2 file @<TRIPOS>MOLECULE The first data line is the name of the molecule. The second data line contains the number of atoms, bonds, substructures, features, and sets associated with the molecule. The third data line is the molecule type. The fourth data line tells the type of charges associated with the molecule. The fifth data line contains the internal SYBYL status bits associated with the molecule. The last data line contains any comment which may be associated with the molecule. @<TRIPOS>ATOM atom_id atom_name x y z atom_type [subst_id [subst_name [charge [status_bit]]]] Example : 1 CA -0.149 0.299 0.000 C.3 1 ALA1 0.000 BACKBONE|DICT|DIRECT In the example above the atom has ID number 1. It is named CA and is located at (-0.149, 0.299, 0.000). Its atom type is C.3. It belongs to the substructure with ID 1 which is named ALA1. The charge associated with the atom is 0.000 and the SYBYL status bits associated with the atom are BACKBONE, DICT, and DIRECT. @<TRIPOS>BOND bond_id origin_atom_id target_atom_id bond_type [status_bits] Example : 1 1 2 ar Example bond shows, it has ID number 1 and connects atoms 1 and 2 .It is an aromatic bond. @<TRIPOS>SUBSTRUCTURE subst_id subst_name root_atom [subst_type [dict_type [chain [sub_type [inter_bonds [status [comment]]]]]]] Example: 1 BENZENE1 PERM 0 **** ****** 0 ROOT The substructure has 1 as ID BENZENE1 as name .It is a type of PERM and associated with dictionary type 0 . The SYBYL status bits indicate it is the ROOT substructure.
  • 23. References • http://www.tripos.com/data/support/mol2.pdf • http://accelrys.com/products/informatics/cheminformatics/ctfile-formats/no-fee.php • Description of Several Chemical Structure File Formats Used by Computer Programs Developed at Molecular Design Limited. Arthur Dalby etal. J. Chem. Inf Comput. Sci. 1992, 32, 244-255. • http://www.chem.ucla.edu/harding/tutorials/stereochem/rsez.pdf • http://www.chem.ucla.edu/harding/notes/notes_14C_stereo03.pdf