The rapids.ai
cudf
type is somewhat compatible with pandas
, but here is a strange incompatibility. cudf.Series
has a .diff()
method, but a cudf.DataFrame
does not appear to. This is super-annoying (consider, for example, a data frame of stock prices, with columns corresponding to instruments). There are, of course, kludgy ays to get around this (converting to pandas data frame and back comes to mind), but I wonder what the canonical way is. Any advice?
cuDF Python covers a large segment of the pandas API, but there are some gaps (as you've run into here).
Today, the easiest way to run diff
on every column and return a dataframe would be the following:
cudf.DataFrame({col: df[col].diff() for col in df.columns})