pythonpython-3.xlistdataframenumexpr

How to query python3 dataframe with integer list elements


I have been stuck on a problem for a few days now, and I would be grateful if someone could help.

I have a dataframe which has a column filled with integer lists items.

For example, a single column dataframe:

>>> df=pd.DataFrame({'a':[[1, 2, 3], [2, 4, 5], [1, 7, 8]]})

>>> df
           a
0  [1, 2, 3]
1  [2, 4, 5]
2  [1, 7, 8]

I would like to run a query on the dataframe to select rows whose elements contain a specific value. The "in" operator doesn't work for this operation. I defined a function func that I call in a query

>>> def func(l, v):
...     return l.apply(lambda val: v in val)

Then when I call the query, it works as expected on python 3.6.3 (xubuntu default install with some updates via pip3). It returns the only row containing the value 7 for example

>>> df.query('@func(a, 7)')
           a
2  [1, 7, 8]

However, when I run it on python 3.6.4, included with the last anaconda release, it fails with the following message: 'Series' objects are mutable, thus they cannot be hashed.

>>> df.query('@func(a, 7)') Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File
"/home/cedric/.local/lib/python3.6/site-packages/pandas/core/frame.py",
line 2297, in query
    res = self.eval(expr, **kwargs)   File "/home/cedric/.local/lib/python3.6/site-packages/pandas/core/frame.py",
line 2366, in eval
    return _eval(expr, inplace=inplace, **kwargs)   File "/home/cedric/.local/lib/python3.6/site-packages/pandas/core/computation/eval.py",
line 295, in eval
    ret = eng_inst.evaluate()   File "/home/cedric/.local/lib/python3.6/site-packages/pandas/core/computation/engines.py",
line 76, in evaluate
    res = self._evaluate()   File "/home/cedric/.local/lib/python3.6/site-packages/pandas/core/computation/engines.py",
line 122, in _evaluate
    _check_ne_builtin_clash(self.expr)   File "/home/cedric/.local/lib/python3.6/site-packages/pandas/core/computation/engines.py",
line 31, in _check_ne_builtin_clash
    names = expr.names   File "/home/cedric/.local/lib/python3.6/site-packages/pandas/core/computation/expr.py",
line 755, in names
    return frozenset([self.terms.name])   File "/home/cedric/.local/lib/python3.6/site-packages/pandas/core/generic.py",
line 1045, in __hash__
    ' hashed'.format(self.__class__.__name__)) TypeError: 'Series' objects are mutable, thus they cannot be hashed

I would like my function to work whatever python3 (>= 3.6) I use. Maybe I'm doing it the wrong way. Any help would be appreciated.

EDIT 1: I'm using pandas 0.22.0 in both cases.

SOLUTION: I found a solution. The problem occurs because of default engine='numexpr' of query function with anaconda. When setting engine='python', it works again.


Solution

  • It seems that the problem occurs because of default engine='numexpr' of query function with anaconda. When setting engine='python', it works again.

    I still can't figure out why it doesn't work with numexpr engine but I can accept this.