I have three trajectory replicas (xtc) of a membrane protein in a simulated physiological environment (water, ions, membrane...) in a MDAnalysis' (2.2.0) Universe. I want to save other three additional xtcs that contain only the trajectory of the protein (of the atoms of the protein), one per each of the original xtc trajectories. When I try to iterate through each of the three MDAnalysis' Readers contained in the Universe, the first saved trajectory seems to be correct, but the other two have the same coordinates in all the frames. The starting, complete trajectories are correct. If my starting point is necessarily a Universe with the three Readers, how do I do this correctly and efficiently?
Code:
import MDAnalysis as mda
u = mda.Universe("11159_dyn_117.pdb", "11156_trj_117.xtc", "11157_trj_117.xtc", "11158_trj_117.xtc")
protein = u.select_atoms("protein")
protein.write("protein.pdb")
for num, reader in enumerate(u.trajectory.readers, 1):
with mda.Writer(f"{num}.xtc", protein.n_atoms) as w:
for ts in reader.trajectory:
w.write(protein.atoms)
# Then check the generated individual trajectories by loading them in
# Universes and checking the positions array. I checked them in PyMOL.
Files downloadable at: https://submission.gpcrmd.org/dynadb/dynamics/id/117/ (model file and trajectory files)
You can write a trajectory directly from an AtomGroup with the AtomGroup.write(name, frames=trajectory_iterator)
method. Access the start/stop frames in the chained trajectory with the private ChainReader._start_frames
attribute (not documented).
import MDAnalysis as mda
# example data
from MDAnalysisTests import datafiles as data
# create a chained trajectory and select some atoms
u = mda.Universe(data.PSF, [data.DCD, data.DCD])
protein = u.select_atoms("protein")
# get start/stop frames:
# array([ 0, 98, 196]) for this example
sf = u.trajectory._start_frames
# write each subtrajectory of the chained trajectory
# to a new file in a different format (only containing
# the atoms of the selected AtomGroup)
for i, (start, stop) in enumerate(zip(sf[:-1], sf[1:])):
protein.atoms.write(f"protein_{i}.xtc", frames=u.trajectory[start:stop])
This will produce trajectories protein_0.xtc
and protein_1.xtc
. If you want to load them, don't forget to create a file that contains a minimal topology for the selection
protein.write("protein.gro")
so that you can load the new trajectories with
p1 = mda.Universe("protein.gro", "protein_1.xtc")
p2 = mda.Universe("protein.gro", "protein_2.xtc")
Notes
ChainReader._start_frames
instead of the raw lengths of trajectories because they are updated appropriately if the ChainReader detects overlapping frames when using continuous=True
.frames=u.trajectory[start:stop]
one can also use frames=slice(start, stop)
.protein.atoms.write()
one normally writes protein.write()
, which is equivalent and shorter, but I wanted to make clear that it's always correct to go to the atoms, in particular if one wanted to write a whole universe, in which case one would use u.atoms.write()
.AtomGroup.write()
one could also use explicit trajectory writing where you open a trajectory for writing with
with mda.Writer("protein_1.xtc", protein.n_atoms) as W:
for ts in u.trajectory[start:stop]:
W.write(protein)
which provides more control over every step but is more verbose.