pythonpandasnumpy

Store numpy array in pandas dataframe


I want to store a numpy array in pandas cell.

This does not work:

import numpy as np
import pandas as pd
bnd1 = np.random.rand(74,8)
bnd2 = np.random.rand(74,8)

df = pd.DataFrame(columns = ["val", "unit"])
df.loc["bnd"] = [bnd1, "N/A"]
df.loc["bnd"] = [bnd2, "N/A"]

But this does:

import numpy as np
import pandas as pd
bnd1 = np.random.rand(74,8)
bnd2 = np.random.rand(74,8)

df = pd.DataFrame(columns = ["val"])
df.loc["bnd"] = [bnd1]
df.loc["bnd"] = [bnd2]

Can someone explain why, and what's the solution?

Edit:

The first returns:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

The complete traceback is below:

> --------------------------------------------------------------------------- AttributeError                            Traceback (most recent call
> last) File
> ~/anaconda3/envs/py38mats/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3185,
> in ndim(a)    3184 try:
> -> 3185     return a.ndim    3186 except AttributeError:
> 
> AttributeError: 'list' object has no attribute 'ndim'
> 
> During handling of the above exception, another exception occurred:
> 
> ValueError                                Traceback (most recent call
> last) Cell In[10], line 8
>       6 df = pd.DataFrame(columns = ["val", "unit"])
>       7 df.loc["bnd"] = [bnd1, "N/A"]
> ----> 8 df.loc["bnd"] = [bnd2, "N/A"]
> 
> File
> ~/anaconda3/envs/py38mats/lib/python3.8/site-packages/pandas/core/indexing.py:849,
> in _LocationIndexer.__setitem__(self, key, value)
>     846 self._has_valid_setitem_indexer(key)
>     848 iloc = self if self.name == "iloc" else self.obj.iloc
> --> 849 iloc._setitem_with_indexer(indexer, value, self.name)
> 
> File
> ~/anaconda3/envs/py38mats/lib/python3.8/site-packages/pandas/core/indexing.py:1835,
> in _iLocIndexer._setitem_with_indexer(self, indexer, value, name)   
> 1832 # align and set the values    1833 if take_split_path:    1834   
> # We have to operate column-wise
> -> 1835     self._setitem_with_indexer_split_path(indexer, value, name)    1836 else:    1837     self._setitem_single_block(indexer,
> value, name)
> 
> File
> ~/anaconda3/envs/py38mats/lib/python3.8/site-packages/pandas/core/indexing.py:1872,
> in _iLocIndexer._setitem_with_indexer_split_path(self, indexer, value,
> name)    1869 if isinstance(value, ABCDataFrame):    1870    
> self._setitem_with_indexer_frame_value(indexer, value, name)
> -> 1872 elif np.ndim(value) == 2:    1873     # TODO: avoid np.ndim call in case it isn't an ndarray, since    1874     #  that will
> construct an ndarray, which will be wasteful    1875    
> self._setitem_with_indexer_2d_value(indexer, value)    1877 elif
> len(ilocs) == 1 and lplane_indexer == len(value) and not
> is_scalar(pi):    1878     # We are setting multiple rows in a single
> column.
> 
> File <__array_function__ internals>:200, in ndim(*args, **kwargs)
> 
> File
> ~/anaconda3/envs/py38mats/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3187,
> in ndim(a)    3185     return a.ndim    3186 except AttributeError:
> -> 3187     return asarray(a).ndim
> 
> ValueError: setting an array element with a sequence. The requested
> array has an inhomogeneous shape after 1 dimensions. The detected
> shape was (2,) + inhomogeneous part.

I'm using pandas 2.0.3 and numpy 1.24.4


Solution

  • The issue is that when you try to insert a numpy array into a pandas DataFrame, pandas can't process the data correctly. To fix this, you can use either a pd.Series or a dictionary for better alignment:

    first way: Using pd.Series:

    df.loc["bnd"] = pd.Series([bnd2, "N/A"], index=["val", "unit"])
    

    OR

    second way: Using dictionary:

    df.loc["bnd"] = {"val": bnd2, "unit": "N/A"}
    

    good luck mate