
HDF5 min_itemsize error: ValueError: Trying to store a string with len [##] in [y] column but this column has a limit of [##]!

I am getting the following error after using pandas.HDFStore().append()

ValueError: Trying to store a string with len [150] in [values_block_0] column but  this column has a limit of [127]!

Consider using min_itemsize to preset the sizes on these columns

I am creating a pandas DataFrame and appending it to the HDF5 file as follows:

import pandas as pd

store = pd.HDFStore("test1.h5", mode='w')

hdf_key = "one_key"

columns = ["col1", "col2", ... ]

df = pd.Dataframe(...)
df.col1 = df.col1.astype(str)
df.col2 = df.col2astype(int)
df.col3 = df.col3astype(str)
store.append(hdf_key, df, data_column=columns, index=False)

I get the error above: "ValueError: Trying to store a string with len [150] in [values_block_0] column but this column has a limit of [127]!"

Afterwards, I execute the code:


which outputs

  "index": Int64Col(shape=(), dflt=0, pos=0),
  "values_block_0": StringCol(itemsize=127, shape=(5,), dflt=b'', pos=1),
  "values_block_1": Int64Col(shape=(5,), dflt=0, pos=2),
  "col1": StringCol(itemsize=20, shape=(), dflt=b'', pos=3),
  "col2": StringCol(itemsize=39, shape=(), dflt=b'', pos=4)}

What are values_block_0 and values_block_1?

So, following this StackOverflow Pandas pytable: how to specify min_itemsize of the elements of a MultiIndex , I tried

store.append(hdf_key, df, data_column=columns, index=False,  min_itemsize={"values_block_0":250})

This doesn't work though---now I get this error:

ValueError: Trying to store a string with len [250] in [values_block_0] column but  this column has a limit of [127]!

Consider using min_itemsize to preset the sizes on these columns

What am I doing wrong?

EDIT: This code produces the error ValueError: min_itemsize has the key [values_block_0] which is not an axis or data_column from

import pandas as pd
store = pd.HDFStore("test1.h5", mode='w')
hdf_key = "one_key"

my_columns = ["col1", "col2", ... ]

df = pd.Dataframe(...)
df.col1 = df.col1.astype(str)
df.col2 = df.col2astype(int)
df.col3 = df.col3astype(str)
store.append(hdf_key, df, data_column=my_columns, index=False, min_itemsize={"values_block_0":350})

Here is the full error:

(python-3) -bash:1008 $ python
Traceback (most recent call last):
  File "", line 50, in <module>
    store.append(hdf_key, dicts_into_df,  data_column=my_columns, index=False, min_itemsize={'values_block_0':350})
  File "/path/lib/python-3/lib/python3.5/site-packages/pandas/io/", line 970, in append
  File "/path/lib/python-3/lib/python3.5/site-packages/pandas/io/", line 1315, in _write_to_group
    s.write(obj=value, append=append, complib=complib, **kwargs)
  File "/path/lib/python-3/lib/python3.5/site-packages/pandas/io/", line 4263, in write
    obj=obj, data_columns=data_columns, **kwargs)
  File "/path/lib/python-3/lib/python3.5/site-packages/pandas/io/", line 3853, in write
  File "/path/lib/python-3/lib/python3.5/site-packages/pandas/io/", line 3535, in create_axes
  File "/path/lib/python-3/lib/python3.5/site-packages/pandas/io/", line 3174, in validate_min_itemsize
    "data_column" % k)
ValueError: min_itemsize has the key [values_block_0] which is not an axis or data_column



    you have misspelled data_columns parameter: data_column - it should be data_columns. As a result you didn't have any indexed columns in your HDF Store and HDF store added values_block_X:

    In [70]: store = pd.HDFStore(r'D:\temp\.data\my_test.h5')

    misspelled parameters will be ignored:

    In [71]: store.append('no_idx_wrong_dc', df, data_column=df.columns, index=False)
    In [72]: store.get_storer('no_idx_wrong_dc').table
    /no_idx_wrong_dc/table (Table(10,)) ''
      description := {
      "index": Int64Col(shape=(), dflt=0, pos=0),
      "values_block_0": Float64Col(shape=(1,), dflt=0.0, pos=1),
      "values_block_1": Int64Col(shape=(1,), dflt=0, pos=2),
      "values_block_2": StringCol(itemsize=30, shape=(1,), dflt=b'', pos=3)}
      byteorder := 'little'
      chunkshape := (1213,)

    is the same as the following:

    In [73]: store.append('no_idx_no_dc', df, index=False)
    In [74]: store.get_storer('no_idx_no_dc').table
    /no_idx_no_dc/table (Table(10,)) ''
      description := {
      "index": Int64Col(shape=(), dflt=0, pos=0),
      "values_block_0": Float64Col(shape=(1,), dflt=0.0, pos=1),
      "values_block_1": Int64Col(shape=(1,), dflt=0, pos=2),
      "values_block_2": StringCol(itemsize=30, shape=(1,), dflt=b'', pos=3)}
      byteorder := 'little'
      chunkshape := (1213,)

    let's spell it correctly:

    In [75]: store.append('no_idx_dc', df, data_columns=df.columns, index=False)
    In [76]: store.get_storer('no_idx_dc').table
    /no_idx_dc/table (Table(10,)) ''
      description := {
      "index": Int64Col(shape=(), dflt=0, pos=0),
      "value": Float64Col(shape=(), dflt=0.0, pos=1),
      "count": Int64Col(shape=(), dflt=0, pos=2),
      "s": StringCol(itemsize=30, shape=(), dflt=b'', pos=3)}
      byteorder := 'little'
      chunkshape := (1213,)

    OLD Answer:

    AFAIK you can effectively set the min_itemsize parameter on the first append only.


    In [33]: df
       num                 s
    0   11  aaaaaaaaaaaaaaaa
    1   12    bbbbbbbbbbbbbb
    2   13     ccccccccccccc
    3   14       ddddddddddd
    In [34]: store = pd.HDFStore(r'D:\temp\.data\my_test.h5')
    In [35]: store.append('test_1', df, data_columns=True)
    In [36]: store.get_storer('test_1').table.description
      "index": Int64Col(shape=(), dflt=0, pos=0),
      "num": Int64Col(shape=(), dflt=0, pos=1),
      "s": StringCol(itemsize=16, shape=(), dflt=b'', pos=2)}
    In [37]: df.loc[4] = [15, 'X'*200]
    In [38]: df
       num                                                  s
    0   11                                   aaaaaaaaaaaaaaaa
    1   12                                     bbbbbbbbbbbbbb
    2   13                                      ccccccccccccc
    3   14                                        ddddddddddd
    In [39]: store.append('test_1', df, data_columns=True)
    ValueError: Trying to store a string with len [200] in [s] column but
    this column has a limit of [16]!
    Consider using min_itemsize to preset the sizes on these columns    

    now using min_itemsize, but still appending to the existing store object:

    In [40]: store.append('test_1', df, data_columns=True, min_itemsize={'s':250})
    ValueError: Trying to store a string with len [250] in [s] column but
    this column has a limit of [16]!
    Consider using min_itemsize to preset the sizes on these columns

    The following works if we will create a new object in our store:

    In [41]: store.append('test_2', df, data_columns=True, min_itemsize={'s':250})

    Check column sizes:

    In [42]: store.get_storer('test_2').table.description
      "index": Int64Col(shape=(), dflt=0, pos=0),
      "num": Int64Col(shape=(), dflt=0, pos=1),
      "s": StringCol(itemsize=250, shape=(), dflt=b'', pos=2)}