pythonnumpy

Python: pick appropriate datatype size (int) automatically


I'm allocating a (possibly large) matrix of zeros with Python and numpy. I plan to put unsigned integers from 1 to N in it.

N is quite variable: could easily range from 1 all the way up to a million, perhaps even more.

I know N prior to matrix initialisation. How can I choose the data type of my matrix such that I know it can hold (unsigned) integers of size N?

Furthermore, I want to pick the smallest such data type that will do.

For example, if N was 1000, I'd pick np.dtype('uint16'). If N is 240, uint16 would work, but uint8 would also work and is the smallest data type I can use to hold the numbers.

This is how I initialise the array. I'm looking for the SOMETHING_DEPENDING_ON_N:

import numpy as np
# N is known by some other calculation.
lbls = np.zeros( (10,20), dtype=np.dtype( SOMETHING_DEPENDING_ON_N ) )

cheers!

Aha!

Just realised numpy v1.6.0+ has np.min_scalar_type, documentation. D'oh! (although the answers are still useful because I don't have 1.6.0).


Solution

  • What about writing a simple function to do the job?

    import numpy as np
    
    def type_chooser(N):
        for dtype in [np.uint8, np.uint16, np.uint32, np.uint64]:
            if N <= dtype(-1):
                return dtype
        raise Exception('{} is really big!'.format(N))
    

    Example usage:

    >>> type_chooser(255)
    <type 'numpy.uint8'>
    >>> type_chooser(256)
    <type 'numpy.uint16'>
    >>> type_chooser(18446744073709551615)
    <type 'numpy.uint64'>
    >>> type_chooser(18446744073709551616)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "spam.py", line 6, in type_chooser
        raise Exception('{} is really big!'.format(N))
    Exception: 18446744073709551616 is really big!