I have been stuck on a problem for a few days now, and I would be grateful if someone could help.
I have a dataframe which has a column filled with integer lists items.
For example, a single column dataframe:
>>> df=pd.DataFrame({'a':[[1, 2, 3], [2, 4, 5], [1, 7, 8]]})
>>> df
a
0 [1, 2, 3]
1 [2, 4, 5]
2 [1, 7, 8]
I would like to run a query on the dataframe to select rows whose elements contain a specific value. The "in" operator doesn't work for this operation.
I defined a function func
that I call in a query
>>> def func(l, v):
... return l.apply(lambda val: v in val)
Then when I call the query, it works as expected on python 3.6.3 (xubuntu default install with some updates via pip3). It returns the only row containing the value 7 for example
>>> df.query('@func(a, 7)')
a
2 [1, 7, 8]
However, when I run it on python 3.6.4, included with the last anaconda release, it fails with the following message: 'Series' objects are mutable, thus they cannot be hashed.
>>> df.query('@func(a, 7)') Traceback (most recent call last): File "<stdin>", line 1, in <module> File
"/home/cedric/.local/lib/python3.6/site-packages/pandas/core/frame.py",
line 2297, in query
res = self.eval(expr, **kwargs) File "/home/cedric/.local/lib/python3.6/site-packages/pandas/core/frame.py",
line 2366, in eval
return _eval(expr, inplace=inplace, **kwargs) File "/home/cedric/.local/lib/python3.6/site-packages/pandas/core/computation/eval.py",
line 295, in eval
ret = eng_inst.evaluate() File "/home/cedric/.local/lib/python3.6/site-packages/pandas/core/computation/engines.py",
line 76, in evaluate
res = self._evaluate() File "/home/cedric/.local/lib/python3.6/site-packages/pandas/core/computation/engines.py",
line 122, in _evaluate
_check_ne_builtin_clash(self.expr) File "/home/cedric/.local/lib/python3.6/site-packages/pandas/core/computation/engines.py",
line 31, in _check_ne_builtin_clash
names = expr.names File "/home/cedric/.local/lib/python3.6/site-packages/pandas/core/computation/expr.py",
line 755, in names
return frozenset([self.terms.name]) File "/home/cedric/.local/lib/python3.6/site-packages/pandas/core/generic.py",
line 1045, in __hash__
' hashed'.format(self.__class__.__name__)) TypeError: 'Series' objects are mutable, thus they cannot be hashed
I would like my function to work whatever python3 (>= 3.6) I use. Maybe I'm doing it the wrong way. Any help would be appreciated.
EDIT 1: I'm using pandas 0.22.0 in both cases.
SOLUTION: I found a solution. The problem occurs because of default engine='numexpr' of query function with anaconda. When setting engine='python', it works again.
It seems that the problem occurs because of default engine='numexpr' of query function with anaconda. When setting engine='python', it works again.
I still can't figure out why it doesn't work with numexpr engine but I can accept this.