I think I'm missing something in the code below.
from sklearn.model_selection import train_test_split
from imblearn.over_sampling import SMOTE
# Split into training and test sets
# Testing Count Vectorizer
X = df[['Spam']]
y = df['Value']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=40)
X_resample, y_resampled = SMOTE().fit_resample(X_train, y_train)
sm = pd.concat([X_resampled, y_resampled], axis=1)
as I'm getting the error
ValueError: could not convert string to float: ---> 19 X_resampled, y_resampled = SMOTE().fit_resample(X_train, y_train)
Example of data is
Spam Value
Your microsoft account was compromised 1
Manchester United lost against PSG 0
I like cooking 0
I'd consider to transform both train and test sets to fix the issue which is causing the error, but I don't know how to apply to both. I've tried some examples on google, but it hasn't fixed the issue.
convert text data to numeric before applying SMOTE , like below.
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
vectorizer.fit(X_train.values.ravel())
X_train=vectorizer.transform(X_train.values.ravel())
X_test=vectorizer.transform(X_test.values.ravel())
X_train=X_train.toarray()
X_test=X_test.toarray()
and then add your SMOTE code
x_train = pd.DataFrame(X_train)
X_resample, y_resampled = SMOTE().fit_resample(X_train, y_train)