Do we already have a function similar to np.add in awkward arrays?
I am in a situation i need to add them, and "+" operator work fine for simple array but not for nested array.
e.g. >>> ak.to_list(c1)
[[], [], [], [], [0.944607075944902]]
>>> ak.to_list(c2)
[[0.9800207661211596], [], [], [], []]
>>> c1+c2
Traceback (most recent call last): File "", line 1, in File "/afs/cern.ch/work/k/khurana/EXOANALYSIS/CMSSW_11_0_2/src/bbDMNanoAOD/analyzer/dependencies/lib/python3.6/site-packages/numpy/lib/mixins.py", line 21, in func return ufunc(self, other) File "/afs/cern.ch/work/k/khurana/EXOANALYSIS/CMSSW_11_0_2/src/bbDMNanoAOD/analyzer/dependencies/lib/python3.6/site-packages/awkward1/highlevel.py", line 1380, in array_ufunc return awkward1._connect._numpy.array_ufunc(ufunc, method, inputs, kwargs) File "/afs/cern.ch/work/k/khurana/EXOANALYSIS/CMSSW_11_0_2/src/bbDMNanoAOD/analyzer/dependencies/lib/python3.6/site-packages/awkward1/_connect/_numpy.py", line 107, in array_ufunc out = awkward1._util.broadcast_and_apply(inputs, getfunction, behavior) File "/afs/cern.ch/work/k/khurana/EXOANALYSIS/CMSSW_11_0_2/src/bbDMNanoAOD/analyzer/dependencies/lib/python3.6/site-packages/awkward1/_util.py", line 972, in broadcast_and_apply out = apply(broadcast_pack(inputs, isscalar), 0) File "/afs/cern.ch/work/k/khurana/EXOANALYSIS/CMSSW_11_0_2/src/bbDMNanoAOD/analyzer/dependencies/lib/python3.6/site-packages/awkward1/_util.py", line 745, in apply outcontent = apply(nextinputs, depth + 1) File "/afs/cern.ch/work/k/khurana/EXOANALYSIS/CMSSW_11_0_2/src/bbDMNanoAOD/analyzer/dependencies/lib/python3.6/site-packages/awkward1/_util.py", line 786, in apply nextinputs.append(x.broadcast_tooffsets64(offsets).content) ValueError: in ListOffsetArray64, cannot broadcast nested list
(https://github.com/scikit-hep/awkward-1.0/blob/0.3.1/src/cpu-kernels/operations.cpp#L778)
only way I can add them is using the firsts and then replacing the None with 0.
>>> z1=ak.fill_none(ak.firsts(c1),0.)
>>> z2=ak.fill_none(ak.firsts(c2),0.)
>>> z1
<Array [0, 0, 0, 0, 0.945] type='5 * float64'>
>>> z2
<Array [0.98, 0, 0, 0, 0] type='5 * float64'>
>>> z1+z2
<Array [0.98, 0, 0, 0, 0.945] type='5 * float64'>
Can something similar to np.add be devised for ak even if with limited scope/functionality. By limited scope I meant if it can work only on the same dimension ak array then it would serve my present purpose at least.
Thanks.
The exception that you saw for
>>> ak.to_list(c1)
[[], [], [], [], [0.944607075944902]]
>>> ak.to_list(c2)
[[0.9800207661211596], [], [], [], []]
>>> c1+c2
is correct: you can't add these two arrays. It's not because Awkward lacks an ak.add
function. Such a thing would be identical to np.add
:
>>> c1 + c2 # this actually calls np.add
<Array [[], [], [], [], [1.89]] type='5 * var * float64'>
>>> np.add(c1, c1)
<Array [[], [], [], [], [1.89]] type='5 * var * float64'>
It doesn't work because the arrays have a different number of elements at each position. It's like trying to add two NumPy arrays with different shapes. (You can add NumPy arrays with certain different shapes, just as you can add Awkward arrays with certain different shapes, if they broadcast. These don't.)
If you want an empty list to behave like a list with a zero in it, then you did the right thing: ak.firsts and ak.singletons convert between two ways of representing missing data:
None
vs another valueIn some languages, a missing or potentially missing value is treated as a length-0 or length-1 list, such as Scala's Option type. Thus,
>>> ak.firsts(c1)
<Array [None, None, None, None, 0.945] type='5 * ?float64'>
presumes that you were starting from empty-or-singleton (appears to be true in your examples) and converts it to an option-type array with one level less depth. Then doing an ak.fill_none means that you wanted these missing values (which came from empty lists) to act like zeros for addition, and you got what you wanted.
>>> ak.fill_none(ak.firsts(c1), 0) + ak.fill_none(ak.firsts(c2), 0)
<Array [0.98, 0, 0, 0, 0.945] type='5 * float64'>
One thing that's not clear from your data is whether you always expect the lists to have at most one item—ak.firsts will only pull the first item out of each list. If you had
>>> c1 = ak.Array([[], [], [], [], [0.999, 0.123]])
>>> c2 = ak.Array([[0.98], [], [], [], []])
then
>>> ak.fill_none(ak.firsts(c1), 0) + ak.fill_none(ak.firsts(c2), 0)
<Array [0.98, 0, 0, 0, 0.999] type='5 * float64'>
might not be what you want, since it drops the 0.123
. You might actually want to ak.pad_none each list to have at least one element, like this:
>>> ak.pad_none(c1, 1)
<Array [[None], [None], ... [0.999, 0.123]] type='5 * var * ?float64'>
>>> ak.fill_none(ak.pad_none(c1, 1), 0)
<Array [[0], [0], [0], [0], [0.999, 0.123]] type='5 * var * float64'>
This maintains the structure, distinguishing between list lengths for all lengths except for 0 and 1, because empty lists have been converted into [0]
. You can't use this for adding unless these longer lists match lengths (back to your original problem), but you can arrange for that, too.
>>> ak.fill_none(ak.pad_none(c1, 2), 0) + ak.fill_none(ak.pad_none(c2, 2), 0)
<Array [[0.98, 0], [0, ... 0], [0.999, 0.123]] type='5 * var * float64'>
It all depends on what structures you have and what structures you want. It wouldn't be a good idea to create a new function that does one of the two things above, especially if it has a name that's dangerously close to a NumPy function's, like np.add
, because it works in a different way that would have to be explained for anyone to safely use it. If you want to do a specialized thing, it's safer to have you build it out of simpler primitives (even if you wrap it up as a convenience function in your own work), because then you know what rules it follows.