Hello Stack Overflow community,
I am facing an issue while trying to apply the Support Vector Classifier (SVC) on a CSV file. Here is the link to CSV File. Download this file for proper view. This file has two columns: "features" and "labels". The "features" column contains array (vector) values, which are quite lengthy, and the "labels" column has two classes: "Controlled" and "Abnormal". However, I'm encountering a ValueError with the message "could not convert string to float."
Here is a snippet of my code:
X = feature_df_wav2vec['features'].apply(lambda x: np.array(x).reshape(-1, 1))
y = feature_df_wav2vec['label']
#X = X.astype(float)
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.3, random_state=42)
X
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0,1))
X_train_scaled = X_train.copy()
X_train_scaled = scaler.fit_transform(np.vstack(X_train).T).flatten() #error-here
X_test_scaled = X_test.copy()
X_test_scaled = scaler.transform(np.vstack(X_test).T).flatten()
svm_classifier = SVC(kernel='linear', C=1.0)
svm_classifier.fit(X_train, y_train)
X_train_scaled = scaler.fit_transform(np.vstack(X_train).T).flatten()
#facing error from this line
I have tried methods like scaling, converting data types, etc., but none have resolved the issue. Could someone please guide me on how to properly preprocess the "features" column before fitting the SVC model?
The problem is in the first line :
X = feature_df_wav2vec['features'].apply(lambda x: np.array(x).reshape(-1, 1))
Use np.fromstring to convert features
to np.array
:
X = feature_df_wav2vec['features'].apply(lambda x: np.fromstring(x[1:-1], sep=' ')).values
Full code :
X = feature_df_wav2vec['features'].apply(lambda x: np.fromstring(x[1:-1], sep=' ')).values
y = feature_df_wav2vec['label']
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.3, random_state=42)
X
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0,1))
X_train_scaled = X_train.copy()
X_train_scaled = scaler.fit_transform(np.vstack(X_train)).flatten() #error-here
X_test_scaled = X_test.copy()
X_test_scaled = scaler.transform(np.vstack(X_test)).flatten()
svm_classifier = SVC(kernel='linear', C=1.0)
svm_classifier.fit(np.vstack(X_train), y_train)