pythonarraysnumpybroadcasting

Is it possible to find similarities between rows in a matrix without loop?


i have a 2D numpy array. I'm trying to compute the similarities between rows and put it into a similarities array. Is this possible without loop? Thanks for your time!

# ratings.shape = (943, 1682)

arri = np.zeros(943)
arri = np.where(arri == 0)[0]

arrj = np.zeros(943)
arrj = np.where(arrj ==0)[0]

similarities = np.zeros((ratings.shape[0], ratings.shape[0]))

similarities[arri, arrj] = np.abs(ratings[arri]-ratings[arrj])

I want to make a 2D-array similarities in that similarities[i, j] is the differentiation between row i and row j in ratings

[ValueError: shape mismatch: value array of shape (943,1682) could not be broadcast to indexing result of shape (943,)] [1][1]: https://i.sstatic.net/gtst9.png


Solution

  • The problem is how numpy iterates through the array when indexing a two-dimentional array with two arrays.


    First some setup:

    import numpy;
    
    ratings = numpy.arange(1, 6)
    
    indicesX = numpy.indices((ratings.shape[0],1))[0]
    indicesY = numpy.indices((ratings.shape[0],1))[0]
    

    ratings: [1 2 3 4 5]

    indicesX: [[0][1][2][3][4]]

    indicesY: [[0][1][2][3][4]]


    Now lets see what your program produces:

    similarities = numpy.zeros((ratings.shape[0], ratings.shape[0]))
    similarities[indicesX, indicesY] = numpy.abs(ratings[indicesX]-ratings[0])
    

    similarities:

    [[0. 0. 0. 0. 0.]
     [0. 1. 0. 0. 0.]
     [0. 0. 2. 0. 0.]
     [0. 0. 0. 3. 0.]
     [0. 0. 0. 0. 4.]]
    

    As you can see, numpy iterates over similarities basically like the following:

    for i in range(5):
        similarities[indicesX[i], indicesY[i]] = numpy.abs(ratings[i]-ratings[0])
    

    similarities:

    [[0. 0. 0. 0. 0.]
     [0. 1. 0. 0. 0.]
     [0. 0. 2. 0. 0.]
     [0. 0. 0. 3. 0.]
     [0. 0. 0. 0. 4.]]
    

    Now instead we need indices like the following to iterate through the entire array:

    indecesX = [0,1,2,3,4,0,1,2,3,4,0,1,2,3,4,0,1,2,3,4,0,1,2,3,4]
    indecesY = [0,0,0,0,0,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4]
    

    We do that the following:

    # Reshape indicesX from (x,1) to (x,). Thats important for numpy.tile().
    indicesX = indicesX.reshape(indicesX.shape[0])
    indicesX = numpy.tile(indicesX, ratings.shape[0])
    
    indicesY = numpy.repeat(indicesY, ratings.shape[0])
    

    indicesX: [0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4]

    indicesY: [0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4]

    Perfect! Now just call similarities[indicesX, indicesY] = numpy.abs(ratings[indicesX]-ratings[indicesY]) again and we see:

    similarities:

    [[0. 1. 2. 3. 4.]
     [1. 0. 1. 2. 3.]
     [2. 1. 0. 1. 2.]
     [3. 2. 1. 0. 1.]
     [4. 3. 2. 1. 0.]]
    

    Here the whole code again:

    import numpy;
    
    ratings = numpy.arange(1, 6)
    
    indicesX = numpy.indices((ratings.shape[0],1))[0]
    indicesY = numpy.indices((ratings.shape[0],1))[0]
    
    similarities = numpy.zeros((ratings.shape[0], ratings.shape[0]))
    
    indicesX = indicesX.reshape(indicesX.shape[0])
    indicesX = numpy.tile(indicesX, ratings.shape[0])
    
    indicesY = numpy.repeat(indicesY, ratings.shape[0])
    
    similarities[indicesX, indicesY] = numpy.abs(ratings[indicesX]-ratings[indicesY])
    print(similarities)
    

    PS

    You commented on your own post to improve it. You should edit your question instead of commenting on it, when you want to improve it.