I'm trying to set up a test_train_split
with data I have read from a csv into a pandas dataframe. The book I am reading says I should separate into x_train
as the data and y_train
as the target, but how can I define which column is the target and which columns are the data? So far i have the following
import pandas as pd
from sklearn.model_selection import train_test_split
Data = pd.read_csv("Data.csv")
I have read to do the split in the following way however the following was using a bunch where the data
and target
were already defined:
X_train, X_test, y_train, y_test = train_test_split(businessleisure_data['data'],
iris_dataset['target'], random_state=0)
You can do like this:
Data = pd.read_csv("Data.csv")
X = Data.drop(['name of the target column'],axis=1).values
y = Data['name of the target column'].values
X_train,X_test,y_train,y_test = train_test_split(X,y,random_state=0)
In most cases, the target variable is the last column of the data set so you can also try this:
Data = pd.read_csv("Data.csv")
X = Data.iloc[:,:-1]
y = Data.iloc[:,-1]
X_train,X_test,y_train,y_test = train_test_split(X,y,random_state=0)