I'm trying to use ReplaceSubstructs() function to combine two SMILES string together by replacing a carbon (C) atom in the first molucule with second molecule. However, no matter what c_options I pick, the RDKit always replace the first or last C atom, and the indices of the atoms also seem to get shuffled.
Here is my code:
from rdkit import Chem
def combine_smile(smile1, smile2, c_options=1):
mol1 = Chem.MolFromSmiles(smile1) mol2 = Chem.MolFromSmiles(smile2)
if mol1 is None or mol2 is None:
raise ValueError("One or both SMILES strings are invalid.")
index_mol1 = mol_with_atom_index(mol1)
combined_mol = Chem.ReplaceSubstructs(index_mol1, Chem.MolFromSmarts("[CH3]"), mol2)[c_options]
combined_smiles = Chem.MolToSmiles(combined_mol)
print(combined_smiles)
def mol_with_atom_index( mol ):
atoms = mol.GetNumAtoms() for idx in range( atoms ): mol.GetAtomWithIdx( idx ).SetProp( 'molAtomMapNumber', str( mol.GetAtomWithIdx( idx ).GetIdx() ) )
return mol
if name == "main":
mol1_smiles = "CC(C)(C)Cl"
mol2_smiles = "CN(C)C"
combine_smile(mol1_smiles, mol2_smiles)
Output:
CN(C)C[C:1]([CH3:2])([CH3:3])[Cl:4] # for c_options=0
CN(C)C[C:1]([CH3:0])([CH3:3])[Cl:4] # for c_options=1
CN(C)C[C:1]([CH3:0])([CH3:2])[Cl:4] # for c_options=2
Desired Output:
CN(C)C[C:1]([CH3:2])([CH3:3])[Cl:4] # for c_options=0
[CH3:0][C:1](CN(C)C)([CH3:3])[Cl:4] # for c_options=1
[CH3:0][C:1]([CH3:2])(CN(C)C)[Cl:4] # for c_options=2
I'm not chemist but as for me it replaces correct elements - in first it replaces [CH3:0]
, in second it replaces [CH3:2]
and in last it replaces [CH3:3]
- as you expect.
Problem is if result can exists in structure which you expect - maybe it changes structure to create element which can exists in nature.
If you only need to display it then you could get string and use regex
to find all substrings [CH3:number]
and later replace it with string CN(C)C
and keep it as string.
import re
text = '[CH3:0][C:1]([CH3:2])([CH3:3])[Cl:4]'
found = re.findall(r'\[CH3:\d+\]', text)
print('found:', found) #
results = []
for item in found:
new_item = text.replace(item, 'CN(C)C')
results.append(new_item)
print('new_item:', new_item)
#results = [text.replace(item, replacement) for item in found] # shorter
c_options = 0
print(c_options, results[c_options])
Result
found: ['[CH3:0]', '[CH3:2]', '[CH3:3]']
new_item: CN(C)C[C:1]([CH3:2])([CH3:3])[Cl:4]
new_item: [CH3:0][C:1](CN(C)C)([CH3:3])[Cl:4]
new_item: [CH3:0][C:1]([CH3:2])(CN(C)C)[Cl:4]
0 CN(C)C[C:1]([CH3:2])([CH3:3])[Cl:4]
If you will have to reuse it then you may create function
import re
def replace_substring(text, pattern, replacement):
found = re.findall(pattern, text)
results = []
for item in found:
new_item = text.replace(item, replacement)
results.append(new_item)
#results = [text.replace(item, replacement) for item in found] # shorter
return results
# --- main ---
text = '[CH3:0][C:1]([CH3:2])([CH3:3])[Cl:4]'
pattern = r'\[CH3:\d+\]'
replacement = 'CN(C)C'
results = replace_substring(text, pattern, replacement)
for c_options in range(3):
print(c_options, results[c_options])
#for c_options, item in enumerate(results):
# print(c_options, item)
0 CN(C)C[C:1]([CH3:2])([CH3:3])[Cl:4]
1 [CH3:0][C:1](CN(C)C)([CH3:3])[Cl:4]
2 [CH3:0][C:1]([CH3:2])(CN(C)C)[Cl:4]
It may need to learn regex
to create more complex patterns.