[SOLVED] Strange result when powering pandas integer series

Strange result when powering pandas integer series

The result when powering a pandas integer Series seems wrong.

# Standard Python
42**42
# 150130937545296572356771972164254457814047970568738777235893533016064

# Pandas series, float dtype
s = pd.Series([12, 42], index=range(2), dtype=float)
s**42
# 0    2.116471e+45
# 1    1.501309e+68
# dtype: float64

# Pandas series, integer dtype
s = pd.Series([12, 42], index=range(2), dtype=int)
s**42
# 0                      0
# 1    4121466560160202752
# dtype: int64

How come?

Solution

Python numbers have an arbitrary precision. Pandas integer columns are backed by numpy int64 numbers, which overflow after 9223372036854775807:

import numpy as np
np.array([12, 42])**42
# array([                  0, 4121466560160202752])

Your number is just too big to represent as integer (in pandas/numpy).

NB. Floating point values, as their name indicate, have a floating precision. They can represent large values (11 bits for the exponent = 2**1023).

To give you a visual representation, here is a graph of x**10 for the first 200 integers, you can clearly see the effect of the overflow after ~78 (it looks random, but it isn't, the values circle back to negative, then positive):

import numpy as np
import matplotlib.pyplot as plt

plt.plot(np.arange(1, 200)**10)