I have a dataset with columns reason
and issue
.
I wanted to encode it as:
enc = OneHotEncoder()
reason_no_enc = enc.fit_transform(temp['REASON NO'].values.reshape(-1, 1)).toarray()
issue_enc = enc.fit_transform(temp['Issue'].values.reshape(-1, 1)).toarray()
But I realized it is creating problem, the later one issue_enc
is considered encoded, when I try to inverse reason_no_enc
, it generates an error.
How to handle it?
You have to use different instances of OHE for each column like this:
# fit encoder using 'REASON NO' data
# later use this instance of OHE to decode 'REASON NO' data
ohe_reason = OneHotEncoder()
reason_no_enc = ohe_reason.fit_transform(temp['REASON NO'].values.reshape(-1, 1)).toarray()
# fit encoder using 'Issue' data
# later use this instance of OHE to decode 'Issue' data
ohe_issue = OneHotEncoder()
issue_enc = ohe_issue.fit_transform(temp['Issue'].values.reshape(-1, 1)).toarray()
And also you can use one instance of OHE for both categories like this:
enc = OneHotEncoder()
encoded_arr = enc.fit_transform(temp[['REASON NO', 'Issue']])