When I search for this topic, I get answers that do not match what I want to do. Let's say I have a table like this:
Item | N1 | N2 | N3 | N4 |
---|---|---|---|---|
Item1 | 1 | 2 | 4 | 8 |
Item2 | 2 | 3 | 6 | 7 |
Item3 | 4 | 5 | 7 | 9 |
Item4 | 1 | 5 | 6 | 7 |
Item5 | 3 | 4 | 7 | 8 |
I would like to one-hot encode this to get:
Item | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|
Item1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
Item2 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 |
Item3 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
Item4 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
Item5 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 |
Is this feasible at all? I am now in the process of coding some sort of loop to go through each line but I decided to ask if anyone knows a more efficient way to do this.
Use melt and crosstab.
tmp = df.melt('Item')
result = pd.crosstab(tmp['Item'], tmp['value']).reset_index().rename_axis(None, axis=1)
Item 1 2 3 4 5 6 7 8 9
0 Item1 1 1 0 1 0 0 0 1 0
1 Item2 0 1 1 0 0 1 1 0 0
2 Item3 0 0 0 1 1 0 1 0 1
3 Item4 1 0 0 0 1 1 1 0 0
4 Item5 0 0 1 1 0 0 1 1 0