I am trying to Sort the array by the second row. For n1, I use two square brackets [][] whereas for n2 I use a single square bracket [,]. I am trying to understand why do I get different results?
Thank you in advance.
import numpy
sampleArray = numpy.array([[34, 43, 73], [82, 22, 12], [53, 94, 66]])
print(sampleArray)
n1= sampleArray[:][sampleArray[1,:].argsort()]
n2 = sampleArray[:,sampleArray[1,:].argsort()]
print(n1)
print(n2)
At the interpreter level, you are doing the following:
someindex = sampleArray[1, :].argsort()
n1 = sampleArray.__getitem__(slice(None)).__getitem__(someindex)
n2 = sampleArray.__getitem__((slice(None), someindex))
The first call to __getitem__(slice(None))
in n1
is effectively a no-op: it just returns a view of the entire original array. The fact that it's technically a separate object won't affect the subsequent read. So n1
is an application of someindex
along the rows.
For n2
, you pass in a tuple of indices (remember that it's commas that make a tuple, not parentheses). When given a tuple as the argument to __getitem__
, numpy arrays split the elements along the leading dimensions. In this case, slice(None)
selects all rows, while someindex
applies along the different columns.
Moral of the story: multidimensional numpy indices are not separable into a series of list-like indices. This is especially important for assignments: x[a, b] = c
is x.__setitem__((a, b), c)
, while x[a][b] = c
is x.__getitem__(a).__setitem__(b, c)
. The first case does what you generally expect, and modifies x
, but can be difficult to construct, e.g., if a
is a mask. The second case is often easier to construct indices for, but creates a temporary object that does not write back to the original array. Stack Overflow has its share of questions about this variant of the scenario.