Hope whoever is reading this is well.
I have an issue with my code. I am trying to convert an isometric SMILE, a descriptor of a molecule, into its atomic groups and neightbours.
My code is below.
import rdkit
from rdkit import Chem
def collect_bonded_atoms(smile):
mol = Chem.MolFromSmiles(smile)
atom_counts = {}
for atom in mol.GetAtoms():
neighbors = [(neighbor.GetSymbol(), bond.GetBondType())
for neighbor, bond in zip(atom.GetNeighbors(), atom.GetBonds())]
neighbors.sort()
key = "{}-{}".format(atom.GetSymbol(), "".join(f"{symbol}{'-' if bond_order == 1 else '=' if bond_order == 2 else '#'}" for symbol, bond_order in neighbors))
atom_counts[key] = atom_counts.get(key, 0) + 1
return atom_counts
smile = "CC(C)(C)C(=O)O"
print(collect_bonded_atoms(smile))
And output is
{'C-C-': 3, 'C-C-C-C-C-': 1, 'C-C-O-O=': 1, 'O-C=': 1, 'O-C-': 1}
Whilst this works well for this molecule's SMILE, though preferably I would've liked it to be structured as,
{'C-C-': 3, 'C-C(-C)(-C)-C-': 1, 'C-C-O(=O)': 1, 'O=C': 1, 'O-C-': 1}
I can't figure out how to fix this. This is a side issue.
The main issue I have is when using this molecule
smile = "CCCCCCCCN1C=C[N+](=C1)C.F[P-](F)(F)(F)(F)F"
My output is very wrong. This is my output.
{'C-C-': 1, 'C-C-C-': 6, 'C-C-N-': 1, 'N-C-C#C#': 2, 'C-C#N#': 2, 'C-N#N#': 1, 'C-N-': 1, 'F-P-': 6, 'P-F-F-F-F-F-F-': 1}
First is that double bonds (bond_order == 2) are shown as a #. Second where it shows the number 1 in the molecule SMILE, that represents a ring. This means that it connects to the next 1. In the output, it is all over the place.
Can I please have some guidance on this? Thanks
Advice on how to fix it, or even better a modification. The side issue isn't as important, but if possible same for it.
You're getting the #
because those bonds are part of an aromatic ring, making their bond type AROMATIC instead of a SINGLE or DOUBLE. In RdKit, AROMATIC bonds have a bond order of 1.5 so all those bonds are ending up in the else
loop.
To fix this, you can do two things:
if-else
condition to acknowledge the 1.5 bond ordermol
object to remove conjugation due to the aromatic ring and make all the bonds static. You can do this by updating your code as:mol = Chem.MolFromSmiles(smile)
Chem.Kekulize(mol)