My question is rather close to How to create a train_test_split based on a conditional in python
but I am looking for a better solution.
I have a pandas dataframe where I would typically use the train_test_split
function
X_train, X_test, y_train, y_test = train_test_split(data[xvars], data[yvar], train_size=0.98, random_state=42)
However, I would like to split based on my pandas column called week
where week < 51 would be train set, and week >= 51 would be test set, how can I achieve this efficiently?
Thanks.
First I sorted the dataframe, and then I apply the solution stated in the doc with shuffle and stratify both set to False.
The solution to this problem is stated in the doc
X_train, X_test, y_train, y_test = train_test_split(X,Y, shuffle=False, test_size=0.4, stratify=None)