rsmote

Get error and warning when I apply smote function to balance the class


I am trying to apply a smote function in order to balance my classes.

this is my code:

smote_train <- SMOTE(tested_covid ~., data = dataTrain, k = 5, perc.over = 100, perc.under = 200)

And this is my error with the warnings:

Error in T[, col] <- data[, col] : 
  incorrect number of subscripts on matrix
In addition: Warning messages:
1: In if (class(data[, col]) %in% c("factor", "character")) { :
  the condition has length > 1 and only the first element will be used
2: In if (class(data[, col]) %in% c("factor", "character")) { :
  the condition has length > 1 and only the first element will be used

This is the data structure and type I have:

structure(list(id = c("ff0113a9-79d4-4042-992f-c5092e30b6af", 
"7b104740-c0c2-44bb-82d8-442ea06a3a96", "8533b6e2-bffe-46da-8056-8b77b89a5819", 
"21d33ae7-8ad8-4744-8370-d376a7e5d251", "c9225467-8ff1-4305-85ad-6c9386e38347", 
"e2e445c4-dffd-4543-b311-efdf2af23744"), age = c(63, 19, 23, 
28, 40, 31), gender = c("Male", "Female", "Male", "Female", "Female", 
"Male"), country = c("India", "Phillipines", "India", "Phillipines", 
"South Africa", "Pakistan"), chills = c("No", "Mild", "No", "Mild", 
"No", "No"), Cough = c("No", "Severe", "No", "Mild", "Mild", 
"No"), diarrhoea = c("No", "Mild", "No", "No", "No", "No"), fatigue = c("No", 
"Moderate", "Mild", "Mild", "Mild", "Mild"), healthcare_worker = c("No", 
"No", "No", "No", "No", "Yes"), how_unwell = c(1, 7, 1, 6, 4, 
2), comorbidity_one = c("Asthma (managed with an inhaler)", "None", 
"Obesity", "High Blood Pressure (hypertension)", "None", "None"
), loss_smell_taste = c("No", "No", "No", "No", "No", "No"), 
    muscle_ache = c("No", "Moderate", "No", "Moderate", "Mild", 
    "Mild"), nasal_congestion = c("No", "No", "No", "No", "Mild", 
    "No"), nausea_vomiting = c("No", "No", "No", "No", "No", 
    "No"), no_days_symptoms_show = c("None", "4", "None", "More than 21", 
    "None", "2"), self_diagnosis = c("None", "Mild", "None", 
    "Mild", "None", "Mild"), shortness_breath = c("No", "Mild", 
    "No", "No", "No", "Mild"), sore_throat = c("No", "No", "No", 
    "No", "Mild", "No"), sputum = c("No", "Mild", "No", "Mild", 
    "Mild", "No"), temperature = c("No", "No", "No", "No", "No", 
    "37.5-38"), tested_covid = structure(c(1L, 1L, 1L, 1L, 1L, 
    1L), .Label = c("Negative", "Positive"), class = "factor")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

Solution

  • I have read the data with read.csv instead of read_csv. I also changed the variables to from character to factor and from int to numeric and that solved the problem.