pythonpandas

Strange result when powering pandas integer series


The result when powering a pandas integer Series seems wrong.

# Standard Python
42**42
# 150130937545296572356771972164254457814047970568738777235893533016064

# Pandas series, float dtype
s = pd.Series([12, 42], index=range(2), dtype=float)
s**42
# 0    2.116471e+45
# 1    1.501309e+68
# dtype: float64

# Pandas series, integer dtype
s = pd.Series([12, 42], index=range(2), dtype=int)
s**42
# 0                      0
# 1    4121466560160202752
# dtype: int64

How come?


Solution

  • Python numbers have an arbitrary precision. Pandas integer columns are backed by numpy int64 numbers, which overflow after 9223372036854775807:

    import numpy as np
    np.array([12, 42])**42
    # array([                  0, 4121466560160202752])
    

    Your number is just too big to represent as integer (in pandas/numpy).

    NB. Floating point values, as their name indicate, have a floating precision. They can represent large values (11 bits for the exponent = 2**1023).


    To give you a visual representation, here is a graph of x**10 for the first 200 integers, you can clearly see the effect of the overflow after ~78 (it looks random, but it isn't, the values circle back to negative, then positive):

    import numpy as np
    import matplotlib.pyplot as plt
    
    plt.plot(np.arange(1, 200)**10)
    

    enter image description here