Smiles
The simplified
molecular input line entry system or smiles is a specification for
unambiguously describing the structure of chemical molecules using short ASCII
strings. Smiles strings can be imported by most molecule editors for conversion
back into two-dimensional drawings or three-dimensional models of the
molecules.
It is widely used all over the world and in many job scopes. The
simplified way of using atomic symbols and a set of intuitive rules as well as
hydrogen-suppressed molecular graphs is computationally efficient. SMILES is
categorized into two: Canonical SMILES and Isomeric SMILES.
The term Canonical SMILES refers to the version of the SMILES
specification that includes rules for ensuring that each distinct chemical
molecule has a single unique SMILES representation. A common application of
Canonical SMILES is for indexing and ensuring uniqueness of molecules in a
database. Isomeric SMILES refers to the version of the SMILES specification
that includes extensions to support the specification of isotopes, chirality,
and configuration about double bonds. A notable feature of these rules is that
they allow rigorous partial specification of chirality.
Graph-based
definition
In terms of a graph-based computational
procedure, SMILES is a string obtained by printing the symbol nodes encountered
in a depth-first tree traversal of a chemical graph. The chemical graph is
first trimmed to remove hydrogen atoms and cycles are broken to turn it into a
spanning tree. Where cycles have been broken, numeric suffix labels are
included to indicate the connected nodes. Parentheses are used to indicate
points of branching on the tree.
SMILES Branches
It
is represented by enclosure in parentheses. It also can be nested or
stacked.
For examples;
CC(O)CC
is 2-Butanol
OCC(C)C
iso-Butanol
OC(C)(C)C
is tert-Butanol
SMILES Symbols
·
String of alphanumeric
characters and certain punctuation symbols.
·
Terminates at the first space
encountered when read left to right.
·
The organic subset: B, C, N, O,
P, S, F, Cl, Br, I
·
Aliphatic or nonaromatic ring:
lowercase letter
·
Designate ring closure with
pairs of matching digits, e.g.
c1ccccc1 is Benzene, whereas
C1CCCCC is Cyclohexane
SMILES Cyclic Structures
·
Break
one single or one aromatic bond in each ring
·
Number
in any order
–Designate ring-breaking
atoms by the same digit following the atomic symbol.
·
Numbers
indicate start and stop of ring
·
Same
number indicates start and end of the ring, entered immediately following the
start/end atoms
·
Only
numbers 1 –9 are used
·
A
number should appear only twice
·
Atom
can be associated w. 2 consecutive numbers, e.g., Napthalene: c12ccccc1cccc2
SMILES
Conventions
·
Avoid
two consecutive left parentheses if possible
·
Strive
for the fewest number of possible branches
·
Tautomeric
bonds are not designated; enter the appropriate form
·
A
branch cannot begin a SMILES notation
·
A
branch cannot immediately follow a double-or triple-bond symbol
·
Example:
C=(CC)C is invalid, but
·
C(=CC)C
or C(CC)=C are valid SMILES
SMILES Cyclic Structures
·
Break
one single or one aromatic bond in each ring
·
Number
in any order
–Designate ring-breaking
atoms by the same digit following the atomic symbol.
·
Numbers
indicate start and stop of ring
·
Same
number indicates start and end of the ring, entered immediately following the
start/end atoms
·
Only
numbers 1 –9 are used
·
A
number should appear only twice
·
Atom
can be associated w. 2 consecutive numbers, e.g., Napthalene: c12ccccc1cccc2
SMILES
Conventions
·
Avoid
two consecutive left parentheses if possible
·
Strive
for the fewest number of possible branches
·
Tautomeric
bonds are not designated; enter the appropriate form
·
A
branch cannot begin a SMILES notation
·
A
branch cannot immediately follow a double-or triple-bond symbol
·
Example:
C=(CC)C is invalid, but
·
C(=CC)C
or C(CC)=C are valid SMILES
SMILES Fragment(see table below):
Nitro | N(=O)(=O) |
Sulfonic acid | S(=O)(=O)O |
Cyanide/nitrile | C#N |
Link to SMILES database here: Click Here
No comments:
Post a Comment