I have a square cost matrix stored as a pandas DataFrame. Rows and columns represent positions [i, j], and I want to multiply all off-diagonal elements (where i != j) by a constant c, without using any for loops for performance reasons.
Is there an efficient way to achieve this in pandas or do I have to switch to numpy and then back to pandas to perform this task?
Example
import pandas as pd
# Sample DataFrame
cost_matrix = pd.DataFrame([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])
# Constant
c = 4
# Desired output
# 1 8 12
# 16 5 24
# 28 16 9
Build a boolean mask with numpy.identity
and update the underlying array in place:
cost_matrix.values[np.identity(n=len(cost_matrix))==0] *= c
output:
0 1 2
0 1 8 12
1 16 5 24
2 28 32 9
Intermediate:
np.identity(n=len(cost_matrix))==0
array([[False, True, True],
[ True, False, True],
[ True, True, False]])
NB. for .values
to be a view of the underlying array, the DataFrame must have been constructed from an homogeneous block. If not, it should be converted to one using cost_matrix = cost_matrix.copy()
.
@PaulS suggested to modify all the values and restore the diagonal. I would use:
d = np.diag(cost_matrix)
cost_matrix *= c
np.fill_diagonal(cost_matrix.values, d)
The mask approach seems to be faster on small/medium size inputs, and the diagonal restoration faster on large inputs. (My previous timings were performed online and I don't reproduce the results with perfplot).
NB. the timings below were computed with c=1
or c=-1
to avoid increasing the values exponentially during the timing.