For example in C#, C++, Java or JavaScript effective int size is 32 bits. If we want to calculate some large number, for example, 70 bits, we should use some software features (Arbitrary-precision arithmetic).
Python has a very tricky integer internal boundless representation and I'm can not figure out what is the most efficient int size for integer arithmetic.
In other words, do we have some int size, say 64 bits, for effective ints usage?
Or it does not matter is it 16
, 32
, 64
or some random bits count
, and Python will work for all these ints with the same efficiency?
In short does Python always uses Arbitrary-precision arithmetic or for 32\64 it uses hardware arithmetic?
Modern CPython is most efficient, depending on the operation, when performing operations with operands with a magnitude below 30 bits for most operations (a few specific operations may have a fast path for values up to 60 bits of magnitude). Everything else uses generalized arbitrary-precision math. On older Python versions, or unusual custom builds of Python, especially those built for 32 bit systems, the limit is 15 bits in general, and 60 bits for certain specific operations.
That said, there's no reason to try to do multiple smaller operations to replace fewer large operations in general; Python level code is slower than the C-level code that implements int
, so you're almost always better off just doing the math "naturally" and not worrying about operand magnitudes.
CPython's int
, in Python 3, is represented as a sign-magnitude array of values, where each element in the array represents 15 or 30 bits of the magnitude. The default "limb" size differs by Python version and the architecture's pointer size:
sizeof(void*) >= 8
), 15 bit limbs on 32 bit builds (can be tweaked with a configure
switch at build time)The long
type in the Python 2 days followed the same rules (15 bits in 2.7.3 and earlier, changing to the 3.1-3.10 behavior in 2.7.4), but since there was a separate int
type (defined in terms of a C long
), the performance of long
didn't matter as much.
All of this is an implementation detail (as you can see by it changing between minor releases on multiple occasions, and even between micro releases in the 2.7 timeframe), but a longstanding one (it was originally 15 all the time, but it was found to be an easy win to double the size and number of used bits per "digit" in the array when working on 64 bit systems). It has optimizations for int
s that fit into a single (or sometimes two) such array values (it extracts the raw value from the array and performs a single CPU operation, skipping loops and algorithms that apply to the arbitrary length case), and on 64 bit builds of CPython, that currently means values with magnitude of 30 bits or less are generally optimized specially (with 60 bit magnitudes occasionally having fast paths).
That said, there's rarely a reason to take this into account; CPython interpreter overhead is pretty high, and it's pretty hard to imagine a scenario where manually breaking down a larger operation into smaller ones (incurring more interpreter overhead to do many small operations at the Python layer) would outweigh the much smaller cost of Python doing the array-based operations (even without special fast paths) at the C layer. The exceptions to this rule would all rely on non-Python-int
-solutions, using fixed size numpy
arrays to vectorize the work, and largely following C rules at that point (since numpy
arrays are wrappers around raw C arrays most of the time).