pythonpandasnumpydata-sciencescientific-computing

How to deal with large integers in NumPy?


I'm doing a data analysis project where I'm working with really large numbers. I originally did everything in pure python but I'm now trying to do it with numpy and pandas. However it seems like I've hit a roadblock, since it is not possible to handle integers larger than 64 bits in numpy (if I use python ints in numpy they max out at 9223372036854775807). Do I just throw away numpy and pandas completely or is there a way to use them with python-style arbitrary large integers? I'm okay with a performance hit.


Solution

  • by default numpy keeps elements as number datatype. But you can force typing to object, like below

    import numpy as np
    x = np.array([10,20,30,40], dtype=object)
    x_exp2 = 1000**x
    print(x_exp2)
    

    the output is

    [1000000000000000000000000000000
     1000000000000000000000000000000000000000000000000000000000000
     1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
     1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000]
    

    The drawback is that the execution is much slower.

    Later Edit to show that np.sum() works. There could be some limitations of course.

    import numpy as np
    x = np.array([10,20,30,40], dtype=object)
    x_exp2 = 1000**x
    
    print(x_exp2)
    print(np.sum(x_exp2))
    print(np.prod(x_exp2))
    

    and the output is:

    [1000000000000000000000000000000
     1000000000000000000000000000000000000000000000000000000000000
     1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
     1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000]
    1000000000000000000000000000001000000000000000000000000000001000000000000000000000000000001000000000000000000000000000000
    1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000