The following code is attempting to use the dummyVars
function in the caret package.
This is .rmd
code and uses a dataset available in the ggplot2 package so this can be completely replicated.
```{r}
#rm(list = ls())
```
```{r}
library(ggplot2)
```
```{r}
data("diamonds")
```
```{r}
data <- diamonds
summary(data)
str(data)
```
```{r}
library(caret)
```
```{r}
dmy <- dummyVars(formula = ~ cut + color + clarity,
data = data,
fullRank = FALSE)
b.vars <- data.frame(predict(dmy, newdata = data))
head(b.vars, n = 10)
```
b.vars should be a data frame of the dummy variables(0s and 1s), but it is returning double values such as 0.6324555.
Also the column names in b.vars are not correct. For example there is "cut.L" instead of "cut.fair"
This is the same process I've used in the past and I don't understand what I'm doing wrong.
Could someone please point out my error?
Thanks!
library(ggplot2)
library(caret)
data("diamonds")
data <- diamonds
data
summary(data)
str(data)
data$cut <- as.factor(as.character(data$cut))
data$clarity <- as.factor(as.character(data$clarity))
data$color <- as.factor(as.character(data$color))
sapply(data, class)
dmy <- dummyVars(formula = ~ cut + color + clarity,
data = data,
fullRank = TRUE)
b.vars <- data.frame(predict(dmy, newdata = data))
head(b.vars, n = 10)
cut.Good cut.Ideal cut.Premium cut.Very.Good color.E color.F color.G color.H color.I color.J clarity.IF clarity.SI1 clarity.SI2 clarity.VS1 clarity.VS2 clarity.VVS1
1 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0
2 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0
3 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
4 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0
5 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0
6 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0
7 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1
8 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0
9 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0
10 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0
clarity.VVS2
1 0
2 0
3 0
4 0
5 0
6 1
7 0
8 0
9 0
10 0
Get rid of the "ordered" class of your variables. You can do that by first converting the variable to character and back to factor on the fly.