pythonpandasnumpymatrix

How to efficiently multiply all non-diagonal elements by a constant in a pandas DataFrame?


I have a square cost matrix stored as a pandas DataFrame. Rows and columns represent positions [i, j], and I want to multiply all off-diagonal elements (where i != j) by a constant c, without using any for loops for performance reasons.

Is there an efficient way to achieve this in pandas or do I have to switch to numpy and then back to pandas to perform this task?

Example

import pandas as pd

# Sample DataFrame
cost_matrix = pd.DataFrame([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

# Constant
c = 4

# Desired output
#    1  8  12
#    16 5  24
#    28 16  9

Solution

  • Build a boolean mask with numpy.identity and update the underlying array in place:

    cost_matrix.values[np.identity(n=len(cost_matrix))==0] *= c
    

    output:

        0   1   2
    0   1   8  12
    1  16   5  24
    2  28  32   9
    

    Intermediate:

    np.identity(n=len(cost_matrix))==0
    
    array([[False,  True,  True],
           [ True, False,  True],
           [ True,  True, False]])
    

    NB. for .values to be a view of the underlying array, the DataFrame must have been constructed from an homogeneous block. If not, it should be converted to one using cost_matrix = cost_matrix.copy().

    Alternative

    @PaulS suggested to modify all the values and restore the diagonal. I would use:

    d = np.diag(cost_matrix)
    cost_matrix *= c
    np.fill_diagonal(cost_matrix.values, d)
    

    Timings

    The mask approach seems to be faster on small/medium size inputs, and the diagonal restoration faster on large inputs. (My previous timings were performed online and I don't reproduce the results with perfplot).

    NB. the timings below were computed with c=1 or c=-1 to avoid increasing the values exponentially during the timing.

    enter image description here