pythonhdf5import-from-csvdtypevaex

vaex Object dtype dtype('O') has no native HDF5 equivalent


I use vaex.from_csv() to convert csv to hdf5 .

import vaex
vaex.from_csv("/Users/xxxx/development/vaex/dataAN/testdata1.csv", convert=True)

Get

IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (53,55) have mixed types.Specify dtype option on import or set low_memory=False.
  exec(code_obj, self.user_global_ns, self.user_ns)
ERROR:MainThread:root:error creating dataset for 'EXT_SUB_ACC', with type dtype('O') 
Traceback (most recent call last):
  File "/Users/xxxx/development/vaex/install/vaex/packages/vaex-core/vaex/hdf5/export.py", line 201, in export_hdf5
    array = h5column_output.require_dataset('data', shape=shape, dtype=dtype.newbyteorder(byteorder))
  File "/Users/xxxx/.conda/envs/vaex/lib/python3.7/site-packages/h5py/_hl/group.py", line 191, in require_dataset
    return self.create_dataset(name, *(shape, dtype), **kwds)
  File "/Users/xxxx/.conda/envs/vaex/lib/python3.7/site-packages/h5py/_hl/group.py", line 136, in create_dataset
    dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
  File "/Users/xxxx/.conda/envs/vaex/lib/python3.7/site-packages/h5py/_hl/dataset.py", line 119, in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
  File "h5py/h5t.pyx", line 1634, in h5py.h5t.py_create
  File "h5py/h5t.pyx", line 1656, in h5py.h5t.py_create
  File "h5py/h5t.pyx", line 1711, in h5py.h5t.py_create
TypeError: Object dtype dtype('O') has no native HDF5 equivalent

what these error mean?


Solution

  • A hint is given here:

    IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (53,55) have mixed types.Specify dtype option on import or set low_memory=False.
    

    Pandas cannot find a common type, and therefore uses dtype=object, which cannot be saved to hdf5 (there is not native machine representation of it). If you force the dtype, by passing that argument to from_csv, vaex will pass it on to pandas.