I'm trying to generate N data points for three random variables that are jointly normal in python. If I use the following code:
import numpy as np
import scipy
import pandas
import sys
from scipy.linalg import block_diag
from pandas import *
N=100
Sigma=np.identity(3)
Mu=np.zeros((3,1))
Z=np.random.multivariate_normal(Mu, Sigma, N)
I got the following error message:
in <module>
Z=np.random.multivariate_normal(Mu, Sigma, N)
File "mtrand.pyx", line 4067, in numpy.random.mtrand.RandomState.multivariate_normal
ValueError: mean must be 1 dimensional
This means that the dimension of np.zeros((3,1))
is not 1
. After changing the line Mu=np.zeros((3,1))
to Mu=np.zeros(3)
, it works. This implies that np.zeros(3)
is 1
dimensional.
As np.zeros(3)
and np.zeros((3,1))
are both an array of three zeros, I guess naturally both should be 1 dimensional. Using Mu.ndim
in each case, I found that the dimension of np.zeros(3)
is one and the dimension of np.zeros((3,1))
is two. My question is:
Why does Python make a distinction between np.zeros((3,1))
and np.zeros(3)
regarding their dimensions (why is this distinction useful)?
It's normal for them to have different dimensions. The first one only has 1 array made of 3 zeros and the second one has 3 arrays each one made of 1 zero.
If you print Mu[0]
in your example, you will get a list [0.]
while if you print Mu[0]
after using np.zeros(3)
to define it, you will get 0.0
I can think of cases where this is distinction is useful especially when working with features in machine learning. If I have a sequence of features of size 1, I would want to use a dimension [n,1] and not [n] because that helps the model (let's say LSTM) make a difference between the sequence size and the feature size.