I have one some data which looks something like this:
Generated by trjconv : P/L=1/400 t= 0.00000
11214
1P1 aP1 1 80.48 35.36 4.25
2P1 aP1 2 37.45 3.92 3.96
11210LI aLI11210 61.61 19.15 3.25
11211LI aLI11211 69.99 64.64 3.17
11212LI aLI11212 70.73 11.64 3.38
11213LI aLI11213 62.67 16.16 3.44
11214LI aLI11214 3.22 9.76 3.39
61.42836 61.42836 8.47704
I have managed to write the data in the required groups, except the last line. I want to write this line to a group /particles/box. This specific line repeated in every frame as you can see in the data file here. So far the code have been designed in this ways that it somehow ignores this line. I tried some ways but getting the following error:
ValueError: Shape tuple is incompatible with data
The last line is time dependent i.e., with every time frame it fluctuates a bit I want this data to be linked with step and time datasets which are defined already in the /particles/lipids/positions/step. Here is the code:
import struct
import numpy as np
import h5py
import re
# First part generate convert the .gro -> .h5 .
csv_file = 'com'
fmtstring = '7s 8s 5s 7s 7s 7s'
fieldstruct = struct.Struct(fmtstring)
parse = fieldstruct.unpack_from
# Format for footer
fmtstring1 = '1s 1s 5s 7s 7s 7s'
fieldstruct1 = struct.Struct(fmtstring1)
parse1 = fieldstruct1.unpack_from
with open(csv_file, 'r') as f, \
h5py.File('xaa_trial.h5', 'w') as hdf:
# open group for position data
## Particles group with the attributes
particles_grp = hdf.require_group('particles/lipids/positions')
box_grp = particles_grp.create_group('box')
dim_grp = box_grp.create_group('dimension')
dim_grp.attrs['dimension'] = 3
bound_grp = box_grp.create_group('boundary')
bound_grp.attrs['boundary'] = ['periodic', 'periodic', 'periodic']
edge_grp = box_grp.create_group('edges')
edge_ds_time = edge_grp.create_dataset('time', dtype='f', shape=(0,), maxshape=(None,), compression='gzip', shuffle=True)
edge_ds_step = edge_grp.create_dataset('step', dtype=np.uint64, shape=(0,), maxshape=(None,), compression='gzip', shuffle=True)
edge_ds_value = None
## H5MD group with the attributes
#hdf.attrs['version'] = 1.0 # global attribute
h5md_grp = hdf.require_group('h5md/version/author/creator')
h5md_grp.attrs['version'] = 1.0
h5md_grp.attrs['author'] = 'rohit'
h5md_grp.attrs['creator'] = 'known'
# datasets with known sizes
ds_time = particles_grp.create_dataset('time', dtype="f", shape=(0,), maxshape=(None,), compression='gzip', shuffle=True)
ds_step = particles_grp.create_dataset('step', dtype=np.uint64, shape=(0,), maxshape=(None,), compression='gzip', shuffle=True)
ds_value = None
step = 0
while True:
header = f.readline()
m = re.search("t= *(.*)$", header)
if m:
time = float(m.group(1))
else:
print("End Of File")
break
# get number of data rows, i.e., number of particles
nparticles = int(f.readline())
# read data lines and store in array
arr = np.empty(shape=(nparticles, 3), dtype=np.float32)
for row in range(nparticles):
fields = parse( f.readline().encode('utf-8') )
arr[row] = np.array((float(fields[3]), float(fields[4]), float(fields[5])))
if nparticles > 0:
# create a resizable dataset upon the first iteration
if not ds_value:
ds_value = particles_grp.create_dataset('value', dtype=np.float32,
shape=(0, nparticles, 3), maxshape=(None, nparticles, 3),
chunks=(1, nparticles, 3), compression='gzip', shuffle=True)
#edge_data = bound_grp.create_dataset('box_size', dtype=np.float32, shape=(0, nparticles, 3), maxshape=(None, nparticles, 3), compression='gzip', shuffle=True)
# append this sample to the datasets
ds_time.resize(step + 1, axis=0)
ds_step.resize(step + 1, axis=0)
ds_value.resize(step + 1, axis=0)
ds_time[step] = time
ds_step[step] = step
ds_value[step] = arr
footer = parse1( f.readline().encode('utf-8') )
dat = np.array(footer)
print(dat)
arr1 = np.empty(shape=(1, 3), dtype=np.float32)
edge_data = bound_grp.create_dataset('box_size', data=dat, dtype=np.float32, compression='gzip', shuffle=True)
step += 1
#=============================================================================
Your code has a number of small errors when reading and converting the "footer" line. I modified the code and got it working....but not sure if it does exactly what you want. I used the same group and dataset definitions. So, the footer data is written to this data set:
/particles/lipids/positions/box/boundary/box_size
That comes from the following group and dataset definitions:
particles_grp = hdf.require_group('particles/lipids/positions')
box_grp = particles_grp.create_group('box')
bound_grp = box_grp.create_group('boundary')
edge_data = bound_grp.create_dataset('box_size'....
Corrections are required in several places:
First, you need to change the definition of parse1
to match the 3 fields.
# Format for footer
# FROM:
fmtstring1 = '1s 1s 5s 7s 7s 7s'
# TO:
fmtstring1 = '10s 10s 10s'
Next, you need to modify where and how the box_size
dataset is created. You need to create it like the others: as an extensible dataset (maxshape=()
parameter) ABOVE while True:
loop. This is what I did:
edge_ds_step = edge_grp.create_dataset('step', dtype=np.uint64, shape=(0,), maxshape=(None,), compression='gzip', shuffle=True)
# Create empty 'box_size' dataset here
edge_data = bound_grp.create_dataset('box_size', dtype=np.float32, shape=(0,3), maxshape=(None,3), compression='gzip', shuffle=True)
Finally, here is the modified code to:
Parse the footer
string to a tuple,
Map the tuple to an np.array of floats, shape=(1,3),
Resize the dataset and finally
Load the array into the dataset.
footer = parse1( f.readline().encode('utf-8') )
dat = np.array(footer).astype(float).reshape(1,3)
new_size = edge_data.shape[0]+1
edge_data.resize(new_size, axis=0)
edge_data[new_size-1:new_size,:] = dat