rdataframeimbalanced-dataoversampling

R: Error in model.frame.default(formula = class ~ step + type + amount + :) : object is not a matrix


I am new to R and I am trying to play around with the data from here. I try to oversampling it but the Error in model.frame.default happen.

  1. The first trial
oversample_data <- ovun.sample(class ~ ., data = sample_dataset, p = 0.5, seed = 1, method="over")$data

But

Error in model.frame.default(formula = class ~ step + type + amount + : object is not a matrix

is shown.

  1. That's why I come up with second trial and turn the dataset into matrix first and then do oversampling
org_dataset <- as.matrix(org_dataset[complete.cases(org_dataset), ])

data_balanced_over <- ovun.sample(class ~ ., data = org_dataset, p = 0.5, seed = 1, method = "over")$data

But it says

Error in model.frame.default(formula = class ~ step + type + amount + : 'data' must be a data.frame, not a matrix or an array

It makes me so confused... What is the right way to do oversampling?


Solution

  • The problem is the formula you're setting for ovun.sample. There is no variable named class in the dataset you're referring to. The documentation of the ROSE package for the formula says that

    The left- hand-side (response) should be a vector specifying the class labels. The right- hand-side should be a series of vectors with the predictors.

    Thus, you'll have to specify a variable holding the class labels. Given the dataset, I assume this would be isFraud. The call, then would be

    oversample_data <- ovun.sample(isFraud ~ ., data = sample_dataset, p = 0.5, seed = 1, method="over")$data