pythonnumpymachine-learningnumpy-ndarrayone-hot-encoding

Convert array of indices to one-hot encoded array in NumPy


Given a 1D array of indices:

a = array([1, 0, 3])

I want to one-hot encode this as a 2D array:

b = array([[0,1,0,0], [1,0,0,0], [0,0,0,1]])

Solution

  • Create a zeroed array b with enough columns, i.e. a.max() + 1.
    Then, for each row i, set the a[i]th column to 1.

    >>> a = np.array([1, 0, 3])
    >>> b = np.zeros((a.size, a.max() + 1))
    >>> b[np.arange(a.size), a] = 1
    
    >>> b
    array([[ 0.,  1.,  0.,  0.],
           [ 1.,  0.,  0.,  0.],
           [ 0.,  0.,  0.,  1.]])