pythonarraysnumpysequencevariable-length-array

Convert Python sequence to NumPy array, filling missing values


The implicit conversion of a Python sequence of variable-length lists into a NumPy array cause the array to be of type object.

v = [[1], [1, 2]]
np.array(v)
>>> array([[1], [1, 2]], dtype=object)

Trying to force another type will cause an exception:

np.array(v, dtype=np.int32)
ValueError: setting an array element with a sequence.

What is the most efficient way to get a dense NumPy array of type int32, by filling the "missing" values with a given placeholder?

From my sample sequence v, I would like to get something like this, if 0 is the placeholder

array([[1, 0], [1, 2]], dtype=int32)

Solution

  • You can use itertools.zip_longest:

    import itertools
    np.array(list(itertools.zip_longest(*v, fillvalue=0))).T
    Out: 
    array([[1, 0],
           [1, 2]])
    

    Note: For Python 2, it is itertools.izip_longest.