I am trying to apply a smote function in order to balance my classes.
this is my code:
smote_train <- SMOTE(tested_covid ~., data = dataTrain, k = 5, perc.over = 100, perc.under = 200)
And this is my error with the warnings:
Error in T[, col] <- data[, col] :
incorrect number of subscripts on matrix
In addition: Warning messages:
1: In if (class(data[, col]) %in% c("factor", "character")) { :
the condition has length > 1 and only the first element will be used
2: In if (class(data[, col]) %in% c("factor", "character")) { :
the condition has length > 1 and only the first element will be used
This is the data structure and type I have:
structure(list(id = c("ff0113a9-79d4-4042-992f-c5092e30b6af",
"7b104740-c0c2-44bb-82d8-442ea06a3a96", "8533b6e2-bffe-46da-8056-8b77b89a5819",
"21d33ae7-8ad8-4744-8370-d376a7e5d251", "c9225467-8ff1-4305-85ad-6c9386e38347",
"e2e445c4-dffd-4543-b311-efdf2af23744"), age = c(63, 19, 23,
28, 40, 31), gender = c("Male", "Female", "Male", "Female", "Female",
"Male"), country = c("India", "Phillipines", "India", "Phillipines",
"South Africa", "Pakistan"), chills = c("No", "Mild", "No", "Mild",
"No", "No"), Cough = c("No", "Severe", "No", "Mild", "Mild",
"No"), diarrhoea = c("No", "Mild", "No", "No", "No", "No"), fatigue = c("No",
"Moderate", "Mild", "Mild", "Mild", "Mild"), healthcare_worker = c("No",
"No", "No", "No", "No", "Yes"), how_unwell = c(1, 7, 1, 6, 4,
2), comorbidity_one = c("Asthma (managed with an inhaler)", "None",
"Obesity", "High Blood Pressure (hypertension)", "None", "None"
), loss_smell_taste = c("No", "No", "No", "No", "No", "No"),
muscle_ache = c("No", "Moderate", "No", "Moderate", "Mild",
"Mild"), nasal_congestion = c("No", "No", "No", "No", "Mild",
"No"), nausea_vomiting = c("No", "No", "No", "No", "No",
"No"), no_days_symptoms_show = c("None", "4", "None", "More than 21",
"None", "2"), self_diagnosis = c("None", "Mild", "None",
"Mild", "None", "Mild"), shortness_breath = c("No", "Mild",
"No", "No", "No", "Mild"), sore_throat = c("No", "No", "No",
"No", "Mild", "No"), sputum = c("No", "Mild", "No", "Mild",
"Mild", "No"), temperature = c("No", "No", "No", "No", "No",
"37.5-38"), tested_covid = structure(c(1L, 1L, 1L, 1L, 1L,
1L), .Label = c("Negative", "Positive"), class = "factor")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
I have read the data with read.csv instead of read_csv. I also changed the variables to from character to factor and from int to numeric and that solved the problem.