I have a dataset with 39 categorical and 27 numerical features. I am trying to encode the categorical data and need to be able to inverse transform and call transform for each column again. Is there a prettier way of doing it than defining 39 separate LabelEncoder instances, and then fit_transform to each column individually?
I feel like I am missing something obvious, but I cant figure it out!
enc = LabelEncoder
cat_feat = [col for col in input_df2.columns if input_df2[col].dtype == 'object']
cat_feat = np.asarray(cat_feat)
le1 =LabelEncoder()
le2 =LabelEncoder()
le3 =LabelEncoder()
...
#extended to le39
def label(input):
input.iloc[:, 1] = le1.fit_transform(input.iloc[:, 1])
input.iloc[:, 3] = le1.fit_transform(input.iloc[:, 3])
input.iloc[:, 4] = le1.fit_transform(input.iloc[:, 4])
...
return input
DataFrame.apply
is just for this. It will call the specified function for each column of the dataframe (or each row, if you pass it axis=1
):
encoders = []
def apply_label_encoder(col):
le = LabelEncoder()
encoders.append(le)
le.fit_transform(col)
return
input_df.iloc[:, 1:] = input_df.iloc[:, 1:].apply(apply_label_encoder)