I have a list of PDB files. I want to extract the ligands of all the files (so, heteroatoms) and save each one separately into PDB files, by using the Bio.PDB module from BioPython.
I tried some solutions, like this one: Remove heteroatoms from PDB , that I tried to adapt to keep the heteroatoms. But all I obtain is files with all the ligand in the same file.
I also tried a thing like this :
def accept_residue(residue):
""" Recognition of heteroatoms - Remove water molecules """
res = residue.id[0]
if res != " ": # Heteroatoms have some flags, that's why we keep only residue with id != " "
if res != "W": # Don't take in consideration the water molecules
return True
def extract_ligands(path):
""" Extraction of the heteroatoms of .pdb files """
for element in os.listdir(path+'/data/pdb'):
i=1
if element.endswith('.pdb'):
if not element.startswith("lig_"):
pdb = PDBParser().get_structure(element[:-4], path+'/data/pdb/'+element)
io = PDBIO()
io.set_structure(pdb)
for model in pdb:
for chain in model:
for residue in chain:
if accept_residue(residue):
io.save("lig_"+element[:-4]+"_"+str(i)+".pdb", accept_residue(residue))
i += 1 # Counter for the result filename
# Main
path = mypath
extract_ligands(path)
Obviously, it raised an error :
AttributeError: 'bool' object has no attribute 'accept_model'
I know that's because of the "accept_residue()" in my "io.save". But I didn't find any logical solution to do what I want to...
At last, I tried a solution like this one, with chain.detach_child() :
...
for chain in model:
for residue in chain:
res = residue.id[0]
if res == " " or res == "W":
chain.detach_child(residue.id)
if len(chain) == 0:
model.detach_child(chain.id)
...
In my mind, it would "detach" all the residues that are not heteroatoms ( res.id[0] == " " ) and all the water ( res.id[0] == "W"). But in fine, all the residues and water are still there and buggy.
So, is it possible to do what I need? ( extract all ligands from all my files and save it one by one separately in PDB files)
You were quite close.
But you have to provide a Select
class as second argument to io.save
. Have a look at the doc comment. It says that this argument should provide accept_model
, accept_chain
, accept_residue
and accept_atom
.
I created a class ResidueSelect
that inherits from Bio.PDB.PDBIO.Select
. That way I only have to override the methods we need. In our case for chain and residues.
Because we only want to save the current residue in the current chain, I provide two respective arguments for the constructor.
import os
from Bio.PDB import PDBParser, PDBIO, Select
def is_het(residue):
res = residue.id[0]
return res != " " and res != "W"
class ResidueSelect(Select):
def __init__(self, chain, residue):
self.chain = chain
self.residue = residue
def accept_chain(self, chain):
return chain.id == self.chain.id
def accept_residue(self, residue):
""" Recognition of heteroatoms - Remove water molecules """
return residue == self.residue and is_het(residue)
def extract_ligands(path):
""" Extraction of the heteroatoms of .pdb files """
for pfb_file in os.listdir(path + '/data/pdb'):
i = 1
if pfb_file.endswith('.pdb') and not pfb_file.startswith("lig_"):
pdb_code = pfb_file[:-4]
pdb = PDBParser().get_structure(pdb_code, path + '/data/pdb/' + pfb_file)
io = PDBIO()
io.set_structure(pdb)
for model in pdb:
for chain in model:
for residue in chain:
if not is_het(residue):
continue
print(f"saving {chain} {residue}")
io.save(f"lig_{pdb_code}_{i}.pdb", ResidueSelect(chain, residue))
i += 1
# Main
path = mypath
extract_ligands(path)
Btw: I tried to improve the readability of your code a little bit in the process...