Saturday 2 November 2013

About SMILES





Smiles
The simplified molecular input line entry system or smiles is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings. Smiles strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the molecules.

It is widely used all over the world and in many job scopes. The simplified way of using atomic symbols and a set of intuitive rules as well as hydrogen-suppressed molecular graphs is computationally efficient. SMILES is categorized into two: Canonical SMILES and Isomeric SMILES.

The term Canonical SMILES refers to the version of the SMILES specification that includes rules for ensuring that each distinct chemical molecule has a single unique SMILES representation. A common application of Canonical SMILES is for indexing and ensuring uniqueness of molecules in a database. Isomeric SMILES refers to the version of the SMILES specification that includes extensions to support the specification of isotopes, chirality, and configuration about double bonds. A notable feature of these rules is that they allow rigorous partial specification of chirality.
Graph-based definition 

                In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree traversal of a chemical graph. The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a spanning tree. Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes. Parentheses are used to indicate points of branching on the tree. 





SMILES Branches
                It is represented by enclosure in parentheses. It also can be nested or stacked.                            
   For examples;
CC(O)CC is 2-Butanol
OCC(C)C iso-Butanol
OC(C)(C)C is tert-Butanol


 SMILES Symbols
·         String of alphanumeric characters and certain punctuation symbols.
·         Terminates at the first space encountered when read left to right.
·         The organic subset: B, C, N, O, P, S, F, Cl, Br, I
·         Aliphatic or nonaromatic ring: lowercase letter
·         Designate ring closure with pairs of matching digits, e.g.
c1ccccc1 is Benzene, whereas
C1CCCCC is Cyclohexane
 




SMILES Cyclic Structures
·         Break one single or one aromatic bond in each ring
·         Number in any order
                   –Designate ring-breaking atoms by the same digit following the atomic symbol.
·         Numbers indicate start and stop of ring
·         Same number indicates start and end of the ring, entered immediately following the start/end atoms
·         Only numbers 1 –9 are used
·         A number should appear only twice
·         Atom can be associated w. 2 consecutive numbers, e.g., Napthalene: c12ccccc1cccc2

SMILES Conventions

·         Avoid two consecutive left parentheses if possible
·         Strive for the fewest number of possible branches
·         Tautomeric bonds are not designated; enter the appropriate form
·         A branch cannot begin a SMILES notation
·         A branch cannot immediately follow a double-or triple-bond symbol
·         Example: C=(CC)C is invalid, but
·         C(=CC)C or C(CC)=C are valid SMILES


SMILES Cyclic Structures
·         Break one single or one aromatic bond in each ring
·         Number in any order
                   –Designate ring-breaking atoms by the same digit following the atomic symbol.
·         Numbers indicate start and stop of ring
·         Same number indicates start and end of the ring, entered immediately following the start/end atoms
·         Only numbers 1 –9 are used
·         A number should appear only twice
·         Atom can be associated w. 2 consecutive numbers, e.g., Napthalene: c12ccccc1cccc2

SMILES Conventions

·         Avoid two consecutive left parentheses if possible
·         Strive for the fewest number of possible branches
·         Tautomeric bonds are not designated; enter the appropriate form
·         A branch cannot begin a SMILES notation
·         A branch cannot immediately follow a double-or triple-bond symbol
·         Example: C=(CC)C is invalid, but
·         C(=CC)C or C(CC)=C are valid SMILES



SMILES Fragment(see table below):

  Nitro N(=O)(=O)
Sulfonic acid S(=O)(=O)O
Cyanide/nitrile C#N

Link to SMILES database here: Click Here

No comments:

Post a Comment