A portion of my dataset looks like this (there are many other processor types in my actual data)
df.head(4)
Processor Task Difficulty Time
i3 34 3 6
i7 34 3 4
i3 50 1 6
i5 25 2 5
I have created a regression model to predict Time
when Type, Task
are Difficulty
are given as inputs.
I have done label encoding
first to change Processor
which is categorical.
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['Processor'] = le.fit_transform(df['Processor'])
df.head(4)
Processor Task Difficulty Time
12 34 3 6
8 34 3 4
12 50 1 6
2 25 2 5
This is my regression model
from sklearn.ensemble import RandomForestRegressor
rf_model = RandomForestRegressor(random_state = 1)
rf_model.fit(features,target)
I want to predict Time
for the input "i5", 20, 1
.
How can I do label encoding to "i5"
to map it to get the same value as in my encoded dataframe in which i5
is encoded to 2
?
I tried this
rf_model.predict([[le.fit_transform('i5'),20,1]])
However I got an output prediction different from the actual value when i5 is entered as 2,
rf_model.predict([[2,20,1)]])
It doesn't work because you are using fit_transform
. This reassigns the categories instead of using the existing encoding, so if you do le.transform
it should work. For example, something like your data:
np.random.seed(111)
df = pd.DataFrame({'Processor':np.random.choice(['i3','i5','i7'],50),
'Task':np.random.randint(25,50,50),
'Difficulty':np.random.randint(1,4,50),
'Time':np.random.randint(1,7,50)})
We make the target and feature, then fit :
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
features = df.iloc[:,:3]
features['Processor'] = le.fit_transform(features['Processor'])
target = df['Time']
from sklearn.ensemble import RandomForestRegressor
rf_model = RandomForestRegressor(random_state = 1)
rf_model.fit(features,target)
'i5' would be 1:
le.classes_
array(['i3', 'i5', 'i7'], dtype=object)
Check predictions:
rf_model.predict([[le.transform(['i5']),20,1]])
array([3.975])
And:
rf_model.predict([[1,20,1]])
array([3.975])