I want to store a numpy array in pandas cell.
This does not work:
import numpy as np
import pandas as pd
bnd1 = np.random.rand(74,8)
bnd2 = np.random.rand(74,8)
df = pd.DataFrame(columns = ["val", "unit"])
df.loc["bnd"] = [bnd1, "N/A"]
df.loc["bnd"] = [bnd2, "N/A"]
But this does:
import numpy as np
import pandas as pd
bnd1 = np.random.rand(74,8)
bnd2 = np.random.rand(74,8)
df = pd.DataFrame(columns = ["val"])
df.loc["bnd"] = [bnd1]
df.loc["bnd"] = [bnd2]
Can someone explain why, and what's the solution?
Edit:
The first returns:
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
The complete traceback is below:
> --------------------------------------------------------------------------- AttributeError Traceback (most recent call
> last) File
> ~/anaconda3/envs/py38mats/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3185,
> in ndim(a) 3184 try:
> -> 3185 return a.ndim 3186 except AttributeError:
>
> AttributeError: 'list' object has no attribute 'ndim'
>
> During handling of the above exception, another exception occurred:
>
> ValueError Traceback (most recent call
> last) Cell In[10], line 8
> 6 df = pd.DataFrame(columns = ["val", "unit"])
> 7 df.loc["bnd"] = [bnd1, "N/A"]
> ----> 8 df.loc["bnd"] = [bnd2, "N/A"]
>
> File
> ~/anaconda3/envs/py38mats/lib/python3.8/site-packages/pandas/core/indexing.py:849,
> in _LocationIndexer.__setitem__(self, key, value)
> 846 self._has_valid_setitem_indexer(key)
> 848 iloc = self if self.name == "iloc" else self.obj.iloc
> --> 849 iloc._setitem_with_indexer(indexer, value, self.name)
>
> File
> ~/anaconda3/envs/py38mats/lib/python3.8/site-packages/pandas/core/indexing.py:1835,
> in _iLocIndexer._setitem_with_indexer(self, indexer, value, name)
> 1832 # align and set the values 1833 if take_split_path: 1834
> # We have to operate column-wise
> -> 1835 self._setitem_with_indexer_split_path(indexer, value, name) 1836 else: 1837 self._setitem_single_block(indexer,
> value, name)
>
> File
> ~/anaconda3/envs/py38mats/lib/python3.8/site-packages/pandas/core/indexing.py:1872,
> in _iLocIndexer._setitem_with_indexer_split_path(self, indexer, value,
> name) 1869 if isinstance(value, ABCDataFrame): 1870
> self._setitem_with_indexer_frame_value(indexer, value, name)
> -> 1872 elif np.ndim(value) == 2: 1873 # TODO: avoid np.ndim call in case it isn't an ndarray, since 1874 # that will
> construct an ndarray, which will be wasteful 1875
> self._setitem_with_indexer_2d_value(indexer, value) 1877 elif
> len(ilocs) == 1 and lplane_indexer == len(value) and not
> is_scalar(pi): 1878 # We are setting multiple rows in a single
> column.
>
> File <__array_function__ internals>:200, in ndim(*args, **kwargs)
>
> File
> ~/anaconda3/envs/py38mats/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3187,
> in ndim(a) 3185 return a.ndim 3186 except AttributeError:
> -> 3187 return asarray(a).ndim
>
> ValueError: setting an array element with a sequence. The requested
> array has an inhomogeneous shape after 1 dimensions. The detected
> shape was (2,) + inhomogeneous part.
I'm using pandas 2.0.3
and numpy 1.24.4
The issue is that when you try to insert a numpy array into a pandas DataFrame, pandas can't process the data correctly. To fix this, you can use either a pd.Series
or a dictionary for better alignment:
first way: Using pd.Series
:
df.loc["bnd"] = pd.Series([bnd2, "N/A"], index=["val", "unit"])
OR
second way: Using dictionary:
df.loc["bnd"] = {"val": bnd2, "unit": "N/A"}
good luck mate