pythonrdkitcheminformatics

Converting SMILES to chemical name or IUPAC name using rdkit or other python module


Is there a way to convert SMILES to either chemical name or IUPAC name using RDKit or other python modules?

I couldn't find something very helpful in other posts.

Thank you very much!


Solution

  • As far as I am aware this is not possible using rdkit, and I do not know of any python modules with this ability. If you are ok with using a web service you could use the NCI resolver.

    Here is a naive implementation of a function to retrieve an IUPAC identifier from a SMILES string:

    import requests
    
    
    CACTUS = "https://cactus.nci.nih.gov/chemical/structure/{0}/{1}"
    
    
    def smiles_to_iupac(smiles):
        rep = "iupac_name"
        url = CACTUS.format(smiles, rep)
        response = requests.get(url)
        response.raise_for_status()
        return response.text
    
    
    print(smiles_to_iupac('c1ccccc1'))
    print(smiles_to_iupac('CC(=O)OC1=CC=CC=C1C(=O)O'))
    
    [Out]:
    BENZENE
    2-acetyloxybenzoic acid
    

    You could easily extend it to convert multiple different formats, although the function isn't exactly fast...

    Another solution is to use PubChem. You can use the API with the python package pubchempy. Bear in mind this may return multiple compounds.

    import pubchempy
    
    
    # Use the SMILES you provided
    smiles = 'O=C(NCc1ccc(C(F)(F)F)cc1)[C@@H]1Cc2[nH]cnc2CN1Cc1ccc([N+](=O)[O-])cc1'
    compounds = pubchempy.get_compounds(smiles, namespace='smiles')
    match = compounds[0]
    print(match.iupac_name)
    
    [Out]:
    (6S)-5-[(4-nitrophenyl)methyl]-N-[[4-(trifluoromethyl)phenyl]methyl]-3,4,6,7-tetrahydroimidazo[4,5-c]pyridine-6-carboxamide