pythonpython-3.xpandasevalnumexpr

Python RecursionError : Simple operation crashes with Pandas.eval()


I have just read and drooled with excitement over these newly found optimization functions for my Pandas related needs. According to this book :

The DataFrame.eval() method allows much more succinct evaluation of expressions with the columns:

result3 = df.eval('(A + B) / (C - 1)') 
np.allclose(result1, result3)

True

To my example :

My dataframe contains around 42000 records and 28 columns. Two of which are Date and Heure which are strings.

My goal : to concatenate both columns into one. Which I can easily do with this piece of code : df_exade_light["Date"]+df_exade_light["Heure"], applying a %timeit on it returns

6.07 ms ± 219 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

But for some reason df.eval('Date + Heure') returns a :

RecursionError: maximum recursion depth exceeded

What's more, I apply the solution found in this thread to raise the allowed stack depth, but the kernel just crashes.

What's the reason for this? Am I doing something wrong?


The problem can be reproduce with this code:

import pandas as pd

df = pd.DataFrame({'A': ['X','Y'],
                   'B': ['U','V']})

df.eval('A+B')

Solution

  • The problem in your reproductible example is that you have string. In the link you give about High-Performance Pandas: eval() and query(), all examples are with float (or int).

    One way to make it work with your example, is by using python as engine:

    df.eval('A+B',engine='python')
    

    By default, the engine used in eval is 'numexpr' according to the documentation and this engine use the library of the same name NumExpr, which is a Fast numerical expression evaluator for NumPy. Although in the previous link, an example with string is presented, it is not with the operation +. If you do df.eval('A==B') it works, same with other comparison operators, but not df.eval('A+B'). You can find more information there but for string, beside using engine='python' it seems limited.

    Going back to your original problem with date and time type, not sure you can find a solution with the default engine (see here for supported datatype)