When I search for this topic, I get answers that do not match what I want to do. Let's say I have a table like this:
| Item | N1 | N2 | N3 | N4 |
|---|---|---|---|---|
| Item1 | 1 | 2 | 4 | 8 |
| Item2 | 2 | 3 | 6 | 7 |
| Item3 | 4 | 5 | 7 | 9 |
| Item4 | 1 | 5 | 6 | 7 |
| Item5 | 3 | 4 | 7 | 8 |
I would like to one-hot encode this to get:
| Item | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|---|
| Item1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| Item2 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 |
| Item3 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| Item4 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
| Item5 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 |
Is this feasible at all? I am now in the process of coding some sort of loop to go through each line but I decided to ask if anyone knows a more efficient way to do this.
Use melt and crosstab.
tmp = df.melt('Item')
result = pd.crosstab(tmp['Item'], tmp['value']).reset_index().rename_axis(None, axis=1)
Item 1 2 3 4 5 6 7 8 9
0 Item1 1 1 0 1 0 0 0 1 0
1 Item2 0 1 1 0 0 1 1 0 0
2 Item3 0 0 0 1 1 0 1 0 1
3 Item4 1 0 0 0 1 1 1 0 0
4 Item5 0 0 1 1 0 0 1 1 0