It is often stated that RPython (a subset of Python) is statically typed. (E.g. on Wikipedia.)
Initially, I wondered how they would add that to Python and thought that they might have added the requirement to add statements such as assert isinstance(arg1, ...)
at the beginning of each function (but I couldn't really believe that).
Then I looked at some RPython code and it doesn't really look statically typed at all. In many cases, it might be that the compiler can prove that a function argument can only be of certain types but definitely not in all cases.
E.g., this is the RPython implementation of string.split
:
def split(value, by, maxsplit=-1):
bylen = len(by)
if bylen == 0:
raise ValueError("empty separator")
res = []
start = 0
while maxsplit != 0:
next = value.find(by, start)
if next < 0:
break
res.append(value[start:next])
start = next + bylen
maxsplit -= 1 # NB. if it's already < 0, it stays < 0
res.append(value[start:len(value)])
return res
In the PyPy documentation about RPython, it is said: "variables should contain values of at most one type".
So, do function arguments also count as variables? Or in what sense is RPython statically typed? Or is this actually misstated?
So, do function arguments also count as variables?
Of course they do. They always do in pretty much every language.
Or in what sense is RPython statically typed? Or is this actually misstated?
The statement is correct. RPython is not Python. Well, it's a subset of it and can be run as Python code. But when you actually compile RPython code, so much dynamicness is taken away from you (albeit only after import time, so you can still use metaclasses, generate code from strings, etc. - used to great effect in some modules) that the compiler (which is not the Python compiler, but vastly different from traditional compilers; see associated documentation) can indeed decide types are used statically. More accurately, code that uses dynamicness makes it past the parser and everything, but results in a type error at some point.
In many cases, it might be that the compiler can prove that a function argument can only be of certain types but definitely not in all cases.
Of course not. There's a lot of code that's not statically typed, and quite some statically-typed code the current annotator can't prove to be statically typed. But when such code is enountered, it's a compilation errors, period.
There are a few points that are important to realize:
Types are inferred, not stated explicitly (well, for the most part; I believe there are a few functions that need assertions to help the annotator). Static typing does not (as you seem to imply in a comment) mean that the type has to be written out (that's called manifest typing), it means that each expression (that includes variables) has a single type that never changes.
All that analysis happens on a whole-program basis! One can't infer a (non-generic) type for a function def add(a, b): return a + b
(the arguments might ints, floats, strings, lists, etc.), but if the function is called with integer arguments (e.g. integer literals or variables that were previously inferred to contain integers), it is determined that a
and b
(and, by the type of +
, the result of add
) are integers too.
Not all code in the PyPy repository is RPython. For example, there are code generators (e.g. in rlib.parsing
) that run at compile time and produce RPython code, but are not RPython (frequently with a "NOT_RPYTHON"
docstring, by the way). Also, large parts of the standard library are written in full Python (mostly taken straight from CPython).
There's a lot of very interesting material on how the whole translation and typing actually works. For example, The RPython Toolchain describes the translation process in general, including type inference, and The RPython Typer describes the type system(s) used.