pythonnumpyrootawkward-array

Convert array of varying sized arrays to numpy array


I am working with a root file (array of arrays). When I load the array into python, I get an awkward array since this is an array of arrays of varying sizes. I would like to learn how to convert this to a numpy array of arrays of the same size, by populating empty elements with NaNs. How can I convert an awkward array of varying size to a numpy array?


Solution

  • Suppose that you have an array of variable-length lists a:

    >>> import numpy as np
    >>> import awkward as ak
    >>> a = ak.Array([[0, 1, 2], [], [3, 4], [5], [6, 7, 8, 9]])
    >>> a
    <Array [[0, 1, 2], [], ... [5], [6, 7, 8, 9]] type='5 * var * int64'>
    

    The function that makes all lists have the same size is ak.pad_none. But first, we need a size to pad it to. We can get the length of each list with ak.num and then take the np.max of that.

    >>> ak.num(a)
    <Array [3, 0, 2, 1, 4] type='5 * int64'>
    >>> desired_length = np.max(ak.num(a))
    >>> desired_length
    4
    

    Now we can pad it and convert that into a NumPy array (because it now has rectangular shape).

    >>> ak.pad_none(a, desired_length)
    <Array [[0, 1, 2, None], ... [6, 7, 8, 9]] type='5 * var * ?int64'>
    >>> ak.to_numpy(ak.pad_none(a, desired_length))
    masked_array(
      data=[[0, 1, 2, --],
            [--, --, --, --],
            [3, 4, --, --],
            [5, --, --, --],
            [6, 7, 8, 9]],
      mask=[[False, False, False,  True],
            [ True,  True,  True,  True],
            [False, False,  True,  True],
            [False,  True,  True,  True],
            [False, False, False, False]],
      fill_value=999999)
    

    The missing values (None) are converted into a NumPy masked array. If you want a plain NumPy array, you can ak.fill_none to give them a replacement value.

    >>> ak.to_numpy(ak.fill_none(ak.pad_none(a, desired_length), 999))
    array([[  0,   1,   2, 999],
           [999, 999, 999, 999],
           [  3,   4, 999, 999],
           [  5, 999, 999, 999],
           [  6,   7,   8,   9]])