A bunch of unicode-related functionality was removed from the Python 3.12 C-API. Unfortunately for me, there's a very old piece of code (~2010) in our library that uses these and I need to migrate this functionality somehow over to 3.12 since we're looking to upgrade to 3.12 eventually. One thing I'm specifically struggling with is the removal of the u#
parameter. The following piece of code would parse any positional parameters passed to foo
(including unicode strings), and store them in input
:
static PyObject *
foo(PyObject *self, PyObject *args) {
Py_UNICODE *input;
Py_ssize_t length;
if (!PyArg_ParseTuple(args, "u#", &input, &length)) {
return NULL;
}
...
}
However, according to the docs, the u#
has been removed:
Changed in version 3.12:
u
,u#
,Z
, andZ#
are removed because they used a legacy Py_UNICODE* representation.
and the current code simply throws something like bad-format-character
when this is compiled and used in pure python.
Py_UNICODE
is just wchar_t
so that's easily fixed. But with the removal of u#
I am not sure how to get PyArg_ParseTuple
to accept unicode input arguments. Using s#
instead of u#
does not work since it won't handle anything widechar. How do I migrate this call in Python 3.12?
s#
handles Unicode fine, but it gives you UTF-8 rather than wchar_t
. If you specifically need a wchar representation, you can get one from a string object with PyUnicode_AsWideCharString
:
Py_ssize_t size;
wchar_t *wchar_representation = PyUnicode_AsWideCharString(stringobj, &size);
if (!wchar_representation) {
// error occurred. do something about that.
}
// do stuff with wchar_representation, then once you're done,
PyMem_Free(wchar_representation);
Unlike the old Py_UNICODE
API, this allocates a new buffer, which you have to free with PyMem_Free
when you're done with it.