pythonrdkitcheminformatics

How to generate a graph from a SMILES molecule representation?


I have a dataset of molecules represented with SMILES strings. I was trying to represent this as graphs. Is there a way to do so? For instance, let's say I have string CC(C)(C)c1ccc2occ(CC(=O)Nc3ccccc3F)c2c1, is there a general way to convert this to a graph representation, meaning adjacency matrix and atom vector? I see questions addressing SMILES from graphs and I know rdkit has MolFromSmiles, but I can't find something to get graph from SMILES string.


Solution

  • You could try pysmiles. Starting from the SMILES description you should be able to create a NetworkX graph and generate the desired objects with code along the lines of

    from pysmiles import read_smiles
    import networkx as nx
        
    smiles = 'C12=C3C4=C5C6=C1C7=C8C9=C1C%10=C%11C(=C29)C3=C2C3=C4C4=C5C5=C9C6=C7C6=C7C8=C1C1=C8C%10=C%10C%11=C2C2=C3C3=C4C4=C5C5=C%11C%12=C(C6=C95)C7=C1C1=C%12C5=C%11C4=C3C3=C5C(=C81)C%10=C23'
    mol = read_smiles(smiles)
        
    # atom vector (C only)
    print(mol.nodes(data='element'))
    # adjacency matrix
    print(nx.to_numpy_matrix(mol))
    

    If you can accept a so-so visualization, you can also tentatively plot the molecule with

    import matplotlib.pyplot as plt
    elements = nx.get_node_attributes(mol, name = "element")
    nx.draw(mol, with_labels=True, labels = elements, pos=nx.spring_layout(mol))
    plt.gca().set_aspect('equal')
    

    Fullerenes are fun to plot :)

    FullereneNetworkX