I am fairly new to R, and I am using it for my thesis. I tried to create a set of commands that recode a range of numeric values as a categorical variable. The range of possible values in my dataset range from 1 - 13. For some reason, all of the values with double digit numbers do not get grouped into the factor level I created, and I don't know why.
Here is my code creating the categorical groups, converting it to factor levels, and the output:
> Desc$Number.of.Chronic.conditions[Desc$Number.of.Chronic.conditions <=2] <- "≤2"
> Desc$Number.of.Chronic.conditions[Desc$Number.of.Chronic.conditions >=3 & Desc$Number.of.Chronic.conditions <5] <- "3 - 4"
> Desc$Number.of.Chronic.conditions[Desc$Number.of.Chronic.conditions >=5 & Desc$Number.of.Chronic.conditions <7] <- "5 - 6"
> Desc$Number.of.Chronic.conditions[Desc$Number.of.Chronic.conditions >=7 & Desc$Number.of.Chronic.conditions <9] <- "7 - 8"
> Desc$Number.of.Chronic.conditions[Desc$Number.of.Chronic.conditions >=9] <- "≥9"
> Desc$Number.of.Chronic.conditions <- factor(Desc$Number.of.Chronic.conditions)
> print(Desc$Number.of.Chronic.conditions)
[1] 5 - 6 7 - 8 ≤2 5 - 6 5 - 6 3 - 4 7 - 8 ≤2 7 - 8 3 - 4 5 - 6 ≤2 5 - 6 ≥9 ≤2 5 - 6 5 - 6 7 - 8 10
[20] 7 - 8 11 7 - 8 5 - 6 ≤2 3 - 4 5 - 6 ≥9 5 - 6 7 - 8 3 - 4 3 - 4 5 - 6 ≤2 ≤2 5 - 6 3 - 4 7 - 8 3 - 4
[39] ≤2 7 - 8 5 - 6 7 - 8 7 - 8 5 - 6 10 ≤2 ≤2 ≤2 ≤2 3 - 4 3 - 4 ≤2 ≤2 ≤2 7 - 8 ≤2 ≤2
[58] 7 - 8 ≤2 3 - 4 3 - 4 ≤2 13 3 - 4 3 - 4 3 - 4 7 - 8 5 - 6 3 - 4 5 - 6 3 - 4 5 - 6 5 - 6 5 - 6 3 - 4 3 - 4
[77] 5 - 6 ≥9 ≤2 ≤2 10 3 - 4 7 - 8 11 7 - 8 5 - 6 3 - 4 3 - 4 ≥9 3 - 4 3 - 4 5 - 6 3 - 4 7 - 8 5 - 6
[96] 5 - 6 3 - 4 12 10 ≤2 5 - 6 5 - 6 3 - 4 3 - 4 3 - 4 5 - 6 5 - 6 3 - 4 5 - 6 ≤2 5 - 6 3 - 4 5 - 6 3 - 4
[115] 3 - 4 ≤2 5 - 6 7 - 8 3 - 4 ≤2 3 - 4 7 - 8 5 - 6 7 - 8 5 - 6 ≤2 7 - 8 ≤2 ≤2 ≥9 7 - 8 ≥9 3 - 4
[134] 5 - 6 ≤2 5 - 6 3 - 4 ≤2 3 - 4 3 - 4 ≤2 5 - 6 3 - 4 ≤2 7 - 8 3 - 4 ≤2 ≤2 3 - 4 3 - 4 ≤2 10
[153] 3 - 4 5 - 6 5 - 6 5 - 6 5 - 6 3 - 4 5 - 6 5 - 6 5 - 6 7 - 8 5 - 6 5 - 6 5 - 6 10 5 - 6 3 - 4 3 - 4 ≤2 3 - 4
[172] ≤2 7 - 8 ≤2 ≤2 7 - 8 ≤2 7 - 8 10 5 - 6 ≥9 3 - 4 3 - 4
Levels: ≤2 ≥9 10 11 12 13 3 - 4 5 - 6 7 - 8
> summary(Desc$Number.of.Chronic.conditions)
≤2 ≥9 10 11 12 13 3 - 4 5 - 6 7 - 8
40 7 7 2 1 1 49 49 27
Even if you start with a vector of integers, when you write a character string into a vector of integers the whole vector is converted to strings.
str(Desc) ## str shows the structure of an object
'data.frame': 15 obs. of 1 variable:
$ Number.of.Chronic.conditions: int 1 2 3 4 5 6 7 8 9 10 ...
Desc$Number.of.Chronic.conditions[Desc$Number.of.Chronic.conditions <=2] <- "≤2"
'data.frame': 15 obs. of 1 variable:
$ Number.of.Chronic.conditions: chr "≤2" "≤2" "3" "4" ...
The rules for >= of an integer are different from strings. Both "9" and 9 are >= 9; however "10" is not >= 9.
There are at least two ways to solve this, perform all the binning in one step using a function like the dplyr
library's mutate(case_when(...)))
, or push the binned factor into its own column:
Desc$Factor.of.Chronic.conditions[Desc$Number.of.Chronic.conditions <=2] <- "≤2"
Desc$Factor.of.Chronic.conditions[Desc$Number.of.Chronic.conditions >=3 & Desc$Number.of.Chronic.conditions <5] <- "3 - 4"
Desc$Factor.of.Chronic.conditions[Desc$Number.of.Chronic.conditions >=5 & Desc$Number.of.Chronic.conditions <7] <- "5 - 6"
Desc$Factor.of.Chronic.conditions[Desc$Number.of.Chronic.conditions >=7 & Desc$Number.of.Chronic.conditions <9] <- "7 - 8"
Desc$Factor.of.Chronic.conditions[Desc$Number.of.Chronic.conditions >=9] <- "≥9"
Desc$Factor.of.Chronic.conditions <- factor(Desc$Factor.of.Chronic.conditions)