I want to construct a matrix of shape (N,2N)
in Python.
I can construct the matrix as follows
import numpy as np
N = 10 # 10,100,1000, whatever
some_vector = np.random.uniform(size=N)
some_matrix = np.zeros((N, 2*N))
for i in range(N):
some_matrix[i, 2*i] = 1
some_matrix[i, 2*i + 1] = some_vector[i]
So the result is a matrix of mostly zeros, but where on row i
, the 2*i
and 2*i + 1
columns are populated.
Is there a faster way to construct the matrix without the loop? Feels like there should be some broadcasting operation...
Edit: There have been some very fast, great answers! I am going to extend the question a touch to my actual use case.
Now suppose some_vector
has a shape (N,T)
. I want to construct a matrix of shape (N,2*N,T)
analogously to the previous case. The naive approach is:
N = 10 # 10,100,1000, whatever
T = 500 # or whatever
some_vector = np.random.uniform(size=(N,T))
some_matrix = np.zeros((N, 2*N,T))
for i in range(N):
for t in range(T):
some_matrix[i, 2*i,t] = 1
some_matrix[i, 2*i + 1,t] = some_vector[i,t]
Can we extend the previous answers to this new case?
You can replace the loop with a broadcasting assignment, which interleaves the columns of an identity matrix with the columns of a diagonal matrix:
a = np.eye(N)
b = np.diag(some_vector)
c = np.empty((N, 2*N))
c[:, 0::2] = a
c[:, 1::2] = b
This is concise, but not optimal. This requires allocating some unnecessary intermediate matrices (a
and b
), which becomes increasing expensive as N
grows larges. This also involves performing random-access assignments to every position in c
, instead of just the non-zero entries.
This may be faster or slower than the implementation in the question for different values of N
.
Similar idea Variant 1, but only performs a single allocation and avoids unnecessary assignments to zero entries.
We create a single vector of size 2*N**2
, which essentially represents the rows of some_matrix
concatenated with one another. The non-zero positions are populated using broadcasting assignments. We then create an (N, 2*N)
view into this vector using np.ndarray.reshape
.
some_matrix = np.zeros(2 * n**2)
step = 2 * (n + 1)
some_matrix[::step] = 1
some_matrix[1::step] = some_vector
some_matrix = some_matrix.reshape(n, 2 * n)
For relatively small N
(N=100
):
For large N
(N=10_000
):
import numpy as np
def original(n):
some_vector = np.random.uniform(size=n)
some_matrix = np.zeros((n, 2 * n))
for i in range(n):
some_matrix[i, 2 * i] = 1
some_matrix[i, 2 * i + 1] = some_vector[i]
def variant_1(n):
some_vector = np.random.uniform(size=n)
a = np.eye(n)
b = np.diag(some_vector)
c = np.empty((n, 2*n))
c[:, 0::2] = a
c[:, 1::2] = b
def variant_2(n):
some_vector = np.random.uniform(size=n)
some_matrix = np.zeros(2 * n**2)
step = 2 * (n + 1)
some_matrix[::step] = 1
some_matrix[1::step] = some_vector
some_matrix = some_matrix.reshape(n, 2*n)
N=100
:%timeit -n1000 original(100)
21.1 µs ± 2.22 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit -n1000 variant_1(100)
12.2 µs ± 431 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit -n1000 variant_2(100)
4.37 µs ± 21.4 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
N=1_000
%timeit -n100 original(1_000)
631 µs ± 98.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit -n100 variant_1(1_000)
5.24 ms ± 157 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit -n100 variant_2(1_000)
408 µs ± 12.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
N=10_000
%timeit -n100 original(10_000)
12.6 ms ± 225 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit -n100 variant_2(10_000)
10.1 ms ± 109 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Tested using Python 3.10.12 and Numpy v1.26.0.