I am using the Cython pip package to speed up the performance of my Pandas operations.
However, I encounter a ChainedAssignmentError due to a discrepancy in reference counts between Cython and standard Python code. When running the following Cythonized script:
import cython
import pandas as pd
import sys
def main():
df = pd.DataFrame({"A": [1, 2, 3]})
df["A"] = df["A"].astype(object)
return df
I get this warning:
FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
The same code runs fine using regular Python
I can hide the warning but that's not ideal because it will break when python 3.0 is released.
The issue stems with how pandas checks for ChainedAssignment,
pandas checks for the number of system references, but cythonised code has 1 less reference by default,
you can see this by calling sys.getrefcount(df) in cythonised vs uncythonised code,
You can also see it happen by adding this code before calling astype, this extra reference will make the code pass without raising the warning
references = [df]
Does anyone know a way a - fix this internally b - alert the pandas team since their user guide says they support Cython
To reproduce
pip install Cython
from setuptools import setup
from Cython.Build import cythonize
setup(
ext_modules = cythonize("run.pyx")
)
python setup.py build_ext --inplace
import run
run.main()
For reference, the ChainedAssignmentError
is to warn you about cases like this:
df["A"][0:3] = 10
where you're essentially doing some_temp[0:3] = 10
, which currently changes df
but won't in the future.
I can hide the warning but that's not ideal because it will break when python 3.0 is released.
Does anyone know a way a - fix this internally
I don't think there's a way to fix it except by silencing the warning.
b - alert the pandas team [...]
The other point to make is: this kind of code isn't the sort of code that Cython will accelerate much because it's just a bunch of Python-style calls to Pandas. The examples in their documentation for using Cython are very much about fast access to individual array elements. So for the code you show, you may not be achieving much by compiling it in Cython. It's possible you have other code that does benefit though.