I have just read and drooled with excitement over these newly found optimization functions for my Pandas
related needs. According to this book :
The DataFrame.eval() method allows much more succinct evaluation of expressions with the columns:
result3 = df.eval('(A + B) / (C - 1)')
np.allclose(result1, result3)
True
To my example :
My dataframe contains around 42000 records and 28 columns. Two of which are Date
and Heure
which are strings.
My goal : to concatenate both columns into one. Which I can easily do with this piece of code : df_exade_light["Date"]+df_exade_light["Heure"]
, applying a %timeit
on it returns
6.07 ms ± 219 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
But for some reason df.eval('Date + Heure')
returns a :
RecursionError: maximum recursion depth exceeded
What's more, I apply the solution found in this thread to raise the allowed stack depth, but the kernel just crashes.
What's the reason for this? Am I doing something wrong?
The problem can be reproduce with this code:
import pandas as pd
df = pd.DataFrame({'A': ['X','Y'],
'B': ['U','V']})
df.eval('A+B')
The problem in your reproductible example is that you have string. In the link you give about High-Performance Pandas: eval() and query(), all examples are with float (or int).
One way to make it work with your example, is by using python as engine:
df.eval('A+B',engine='python')
By default, the engine used in eval
is 'numexpr'
according to the documentation and this engine use the library of the same name NumExpr, which is a Fast numerical expression evaluator for NumPy. Although in the previous link, an example with string is presented, it is not with the operation +. If you do df.eval('A==B')
it works, same with other comparison operators, but not df.eval('A+B')
. You can find more information there but for string, beside using engine='python'
it seems limited.
Going back to your original problem with date and time type, not sure you can find a solution with the default engine (see here for supported datatype)