pythoncythonpython-typing

Python3 Cython: function __annotation__ yields unicode instead of str


In my code base, I use Cython to create shared object files from my modules. When I try to access the function __annotations__, I get the different behavior when I use Cython and when I do not, and I am wondering why this is the case:

Minimal example:

I tried to produce a minimal reproducable example and came up with the following two files main.py and setup.py that I create within the same directory. Requirements: pip install Cython setuptools.

main.py

import setup

def test(name: str): pass

if __name__ == '__main__':
    print(test.__annotations__, setup.test.__annotations__)

Setup.py

from Cython.Build import cythonize
from setuptools import setup, Extension

def test(name: str): pass

if __name__ == '__main__':
    setup(python_requires='>=3.6', ext_modules=cythonize([Extension('setup', ['setup.py']), ], language_level="3"))

Execution

python main.py
> {'name': <class 'str'>}
> {'name': <class 'str'>}

python setup.py build_ext --inplace
python main.py
> {'name': <class 'str'>}
> {'name': 'unicode'}

I would have expected the __annotations__ to yield str also when using the shared object files, but it yields unicode instead. Why is this the case?

I use Python 3.9.2 and Cython version 0.29.21.


Solution

  • This is really more of a bug report than a question, so probably should be on the Cython bug tracker instead of here. Fortunately something similar already is reported.

    There's a few things going on here:

    1. Cython still aims to support Python 2.7. That means that the code it generates has to work on both. While annotations are a Py3 feature, the code to get the name is general and used in a number of places. It therefore picks unicode rather than str for strings that it knows are definitely unicode to ensure that both are supported.
    2. Your choice of language_level=3 also affects it - strings are unicode whether or not it's running in Python 2 or 3. If you use language_level=3str it uses "native" strings instead and thus returns str (i.e. bytes on Py2, unicode on Py3).
    3. At least on Cython 3 (alpha, at the moment), Cython implements PEP-563, and thus the annotations are always stored as strings rather than types - thus you get 'unicode' rather than <class 'unicode'>. This reflects the future behaviour of Python. I didn't think this change was in Cython 0.29.x so I'm a little surprised that you see it here - it's probably a shortcut that's accidentally "right" in this case.

    The forthcoming Cython 3 release is aiming to improve treatments of annotations (and other introspection features) to be closer to Python behaviour. The 0.29 branch probably won't see changes to this.