pythonclasssubclasspysparksubclassing

Calling __new__ when making a subclass of tuple


In Python, when subclassing tuple, the __new__ function is called with self as an argument. For example, here is a paraphrased version of PySpark's Row class:

class Row(tuple):
    def __new__(self, args):
        return tuple.__new__(self, args)

But help(tuple) shows no self argument to __new__:

  __new__(*args, **kwargs) from builtins.type
      Create and return a new object.  See help(type) for accurate signature.

and help(type) just says the same thing:

__new__(*args, **kwargs)
      Create and return a new object.  See help(type) for accurate signature.

So how does self get passed to __new__ in the Row class definition?

Is it possible to view the source of tuple.__new__ so I can see the answer for myself?

My question is not a duplicate of this one because in that question, all discussion refers to __new__ methods that explicitly have self or cls as first argument. I'm trying to understand

  1. Why the tuple.__new__ method does not have self or cls as first argument.
  2. How I might go about examining the source code of the tuple class, to see for myself what's really going on.

Follow-up: Moderators closed this old question as a duplicate of this one. But it's not a duplicate. Look at the accepted answer on this question and note how little overlap it has with the answers in the claimed duplicate, in terms of the information provided.


Solution

  • The correct signature of tuple.__new__

    Functions and types implemented in C often can't be inspected, and their signature always look like that one.

    The correct signature of tuple.__new__ is:

    __new__(cls[, sequence])
    

    For example:

    >>> tuple.__new__(tuple)
    ()
    >>> tuple.__new__(tuple, [1, 2, 3])
    (1, 2, 3)
    

    Not surprisingly, this is exactly as calling tuple(), except for the fact that you have to repeat tuple twice.


    The first argument of __new__

    Note that the first argument of __new__ is always the class, not the instance. In fact, the role of __new__ is to create and return the new instance.

    The special method __new__ is a static method.

    I'm saying this because in your Row.__new__ I can see self: while the name of the argument is not important (except when using keyword arguments), beware that self will be Row or a subclass of Row, not an instance. The general convention is to name the first argument cls instead of self.


    Back to your questions

    So how does self get passed to __new__ in the Row class definition?

    When you call Row(...), Python automatically calls Row.__new__(Row, ...).

    • Is it via *args?

    You can write your Row.__new__ as follows:

    class Row(tuple):
        def __new__(*args, **kwargs):
            return tuple.__new__(*args, **kwargs)
    

    This works and there's nothing wrong about it. It's very useful if you don't care about the arguments.

    • Does __new__ have some subtlety where its signature can change with context?

    No, the only special thing about __new__ is that it is a static method.

    • Or, is the documentation mistaken?

    I'd say that it is incomplete or ambiguous.

    • Why the tuple.__new__ method does not have self or cls as first argument.

    It does have, it's just not appearing on help(tuple.__new__), because often that information is not exposed by functions and methods implemented in C.

    • How I might go about examining the source code of the tuple class, to see for myself what's really going on.

    The file you are looking for is Objects/tupleobject.c. Specifically, you are interested in the tuple_new() function:

    static char *kwlist[] = {"sequence", 0};
    /* ... */
    if (!PyArg_ParseTupleAndKeywords(args, kwds, "|O:tuple", kwlist, &arg))
    

    Here "|O:tuple" means: the function is called "tuple" and it accepts one optional argument (| delimits optional arguments, O stands for a Python object). The optional argument may be set via the keyword argument sequence.


    About help(type)

    For the reference, you were looking at the documentation of type.__new__, while you should have stopped at the first four lines of help(type):

    In the case of __new__() the correct signature is the signature of type():

    class type(object)
     |  type(object_or_name, bases, dict)
     |  type(object) -> the object's type
     |  type(name, bases, dict) -> a new type
    

    But this is not relevant, as tuple.__new__ has a different signature.


    Remember super()!

    Last but not least, try to use super() instead of calling tuple.__new__() directly.