I have a pandas
dataframe indexed by date. Let's assume it from Jan-1 to Jan-30. I want to split this dataset into X_train, X_test, y_train, y_test but I don't want to mix the dates so I want the train and test samples to be divided by a certain date (or index). I'm trying
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
But when I check the values, I see the dates are mixed. I want to split my data as:
Jan-1 to Jan-24
to train and Jan-25 to Jan-30
to test (as test_size is 0.2, that makes 24 to train and 6 to test)
How can I do this?
you should use
X_train, X_test, y_train, y_test = train_test_split(X,Y, shuffle=False, test_size=0.2, stratify=None)
don't use random_state=None
it will take numpy.random
in here its mentioned that use shuffle=False
along with stratify=None