pythonintegerlong-integerinteger-arithmetic

Python effective integer size


For example in C#, C++, Java or JavaScript effective int size is 32 bits. If we want to calculate some large number, for example, 70 bits, we should use some software features (Arbitrary-precision arithmetic).

Python has a very tricky integer internal boundless representation and I'm can not figure out what is the most efficient int size for integer arithmetic.

In other words, do we have some int size, say 64 bits, for effective ints usage?

Or it does not matter is it 16, 32, 64 or some random bits count, and Python will work for all these ints with the same efficiency?

In short does Python always uses Arbitrary-precision arithmetic or for 32\64 it uses hardware arithmetic?


Solution

  • CPython's int, in Python 3, is represented as a sign-magnitude array of values, where each element in the array represents 15 or 30 bits of the magnitude, for 32 and 64 bit Python builds respectively. This is an implementation detail, but a longstanding one (it was originally 15 all the time, but it was found to be an easy win to double the size and number of used bits per "digit" in the array when working on 64 bit systems). It has optimizations for ints that fit into a single (or sometimes two) such array values (it extracts the raw value from the array and performs a single CPU operation, skipping loops and algorithms that apply to the arbitrary length case), and on 64 bit builds of CPython, that currently means values with magnitude of 30 bits or less are generally optimized specially (with 60 bit magnitudes occasionally having fast paths).

    That said, there's rarely a reason to take this into account; CPython interpreter overhead is pretty high, and it's pretty hard to imagine a scenario where manually breaking down a larger operation into smaller ones (incurring more interpreter overhead to do many small operations at the Python layer) would outweigh the much smaller cost of Python doing the array-based operations (even without special fast paths) at the C layer. The exceptions to this rule would all rely on non-Python-int-solutions, using fixed size numpy arrays to vectorize the work, and largely following C rules at that point (since numpy arrays are wrappers around raw C arrays most of the time).