Here is a snippet of my df:
0 1 2 3 4 5 ... 11 12 13 14 15 16
0 BSO PRV BSI TUR WSP ACP ... HLR HEX HEX None None None
1 BSO PRV BSI TUR WSP ACP ... HLF HLR HEX HEX HEX None
2 BSO PRV BSI HLF HLR TUR ... HEX RSO RSI HEX HEX HEX
3 BSO PRV BSI HLF HLR TUR ... RSO RSI HEX HEX HEX None
4 BSO PRV BSI HLF TUR WSP ... RSO RSI HLR HEX HEX HEX
... ... ... ... ... ... ... ... ... ... ... ... ...
32607 BSO PRV BSI TUR WSP ACP ... HEX None None None None None
32608 BSO PRV BSI TUR WSP ACP ... HEX None None None None None
32609 BSO PRV BSI TUR WSP ACP ... HEX None None None None None
32610 BSO PRV BSI TUR WSP ACP ... HEX None None None None None
32611 BSO PRV BSI TUR WSP ACP ... HEX None None None None None
each cell is a string (obviously), and i want to label encode each row with the same value for each string in each row, for example, all BSO = 1
, all 'PRV = 2' etc. The values do not matter as long as they are the same. I would like to exclude the None
value if possible, but if not thats ok.
I tried df.apply(le.fit_transform)
and the result was:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0 0 0 0 2 2 0 1 1 3 2 1 2 0 0 1 1 1
1 0 0 0 2 2 0 1 1 1 3 3 1 2 0 0 0 1
2 0 0 0 0 0 1 2 4 0 0 0 0 4 3 0 0 0
3 0 0 0 0 0 1 3 0 1 0 0 4 3 0 0 0 1
4 0 0 0 0 1 2 2 0 1 0 0 4 3 2 0 0 0
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
32607 0 0 0 2 2 0 1 2 2 1 2 0 5 4 1 1 1
32608 0 0 0 2 2 0 1 2 2 1 2 0 5 4 1 1 1
32609 0 0 0 2 2 0 1 2 2 1 2 0 5 4 1 1 1
32610 0 0 0 2 2 0 1 2 2 1 2 0 5 4 1 1 1
32611 0 0 0 2 2 0 1 2 2 1 2 0 5 4 1 1 1
and as you can compare, the integers do not match the values for each row.
It looks like the problem is that you have applied the transform on each column (default behaviour). Try:
df.apply(fit_transform, axis=1)
The axis=1
argument will result in fit_transform
being applied to each row.
Hope it helps.