pythonpandasone-hot-encoding

How can I do one-hot encoding from multiple columns


When I search for this topic, I get answers that do not match what I want to do. Let's say I have a table like this:

Item N1 N2 N3 N4
Item1 1 2 4 8
Item2 2 3 6 7
Item3 4 5 7 9
Item4 1 5 6 7
Item5 3 4 7 8

I would like to one-hot encode this to get:

Item 1 2 3 4 5 6 7 8 9
Item1 1 1 0 1 0 0 0 1 0
Item2 0 1 1 0 0 1 1 0 0
Item3 0 0 0 1 1 0 1 0 1
Item4 1 0 0 0 1 1 1 0 0
Item5 0 0 1 1 0 0 1 1 0

Is this feasible at all? I am now in the process of coding some sort of loop to go through each line but I decided to ask if anyone knows a more efficient way to do this.


Solution

  • Use melt and crosstab.

    tmp = df.melt('Item')
    result = pd.crosstab(tmp['Item'], tmp['value']).reset_index().rename_axis(None, axis=1)
    
        Item  1  2  3  4  5  6  7  8  9
    0  Item1  1  1  0  1  0  0  0  1  0
    1  Item2  0  1  1  0  0  1  1  0  0
    2  Item3  0  0  0  1  1  0  1  0  1
    3  Item4  1  0  0  0  1  1  1  0  0
    4  Item5  0  0  1  1  0  0  1  1  0