pythonnumpydesign-decisions

Rationale for numpy.split returning a list and not an array


I was surprised that numpy.split yields a list and not an array. I would have thought it would be better to return an array, since numpy has put a lot of work into making arrays more useful than lists. Can anyone justify numpy returning a list instead of an array? Why would that be a better programming decision for the numpy developers to have made?


Solution

  • A comment pointed out that if the slit is uneven, the result can't be a array, at least not one that has the same dtype. At best it would be an object dtype.

    But lets consider the case of equal length subarrays:

    In [124]: x = np.arange(10)
    In [125]: np.split(x,2)
    Out[125]: [array([0, 1, 2, 3, 4]), array([5, 6, 7, 8, 9])]
    In [126]: np.array(_)     # make an array from that
    Out[126]: 
    array([[0, 1, 2, 3, 4],
           [5, 6, 7, 8, 9]])
    

    But we can get the same array without split - just reshape:

    In [127]: x.reshape(2,-1)
    Out[127]: 
    array([[0, 1, 2, 3, 4],
           [5, 6, 7, 8, 9]])
    

    Now look at the code for split. It just passes the task to array_split. Ignoring the details about alternative axes, it just does

    sub_arys = []
    for i in range(Nsections):
        # st and end from `div_points
        sub_arys.append(sary[st:end])
    return sub_arys
    

    In other words, it just steps through array and returns successive slices. Those (often) are views of the original.

    So split is not that sophisticate a function. You could generate such a list of subarrays yourself without a lot of numpy expertise.

    Another point. Documentation notes that split can be reversed with an appropriate stack. concatenate (and family) takes a list of arrays. If give an array of arrays, or a higher dim array, it effectively iterates on the first dimension, e.g. concatenate(arr) => concatenate(list(arr)).