I'm a programmer who tries to find he's way into ML world. so the Question might be basic.
i have data from years 2010-2019. Now i'm trying to test different parameters on gradient boosting regression and i want to use 60% for traning,20% for Validation and 20% for Testing. Due to the nature of the Question that i'm trying to answer. I have already splitted the data into Train_df
from 2010 till 2014 ,evaluate_df
2015 till 2017, test_df
from 2018-2019.
model should be trained on trained_df
, and evaluated on evaluate_df
, finally i use the best model for Test dataframe test_df
.
This is my code:
p_test3 = {'learning_rate':[0.1,0.05,0.01,0.005], 'n_estimators':[500,750,1000,1250,1500]}
tuning = GridSearchCV(estimator =GradientBoostingRegressor( min_samples_split=2, min_samples_leaf=1, subsample=1,max_features='sqrt', random_state=10),
param_grid = p_test3, scoring='r2',n_jobs=-1, cv=evaluate_df)
tuning.fit(train_df[[col1]],train_df['col2'])
tuning.cv_results_, tuning.best_params_, tuning.best_score_
but i got this error:
ValueError: too many values to unpack (expected 2)
How can i test the model of GridSearchCV on a dataframe?
2 dataframes should be combined and then new list has to be generated containing 0 for trainig and 1 for testing. then pass it to cv.
combined_df=pd.concat([train_df,evaluate_df])
test_fold = [0] * len(train_df) + [1] * len(evaluate_df)
p_test3 = {'learning_rate':[0.1,0.05,0.01,0.005], 'n_estimators':[500,750,1000,1250,1500]}
ps = PredefinedSplit(test_fold=test_fold)
tuning = GridSearchCV(estimator =GradientBoostingRegressor( min_samples_split=2, min_samples_leaf=1, subsample=1,max_features='sqrt', random_state=10),
param_grid = p_test3, scoring='r2',n_jobs=-1, cv=ps)