I want to discretize a column which contains of a continous variable.
the data looks like ;
c(0,25,77,423,6,8,3,65,32,22,10,0,8,0,15,0,10,1,2,4,5,5,6)
I want turn the numbers into categorical by discretizing, but zeros represent a different category. Sometimes directly discretizing could keep different numbers with zero.
I thought if I keep zeros out then discretize my wish comes true. But in a dataframe column I can't do it because of indexes:
here is an example dput()
output
structure(list(dummy_column = c(0, 25, 77, 423, 6, 8, 3, 65,
32, 22, 10, 0, 8, 0, 15, 0, 10, 1, 2, 4, 5, 5, 6)), class = "data.frame", row.names = c(NA,
-23L))
for example, if I'd like to use 2 breaks, categories should be; zero and the other 3 discretized ones, totally 4 categories. it should be better if I could write function that discretizes a column that can be directly created with dplyr::mutate()
thanks in advance.
If I understood it correctly, your goal is to keep "0" as a separate category when discretizing. Here's a solution using arules::discretize
to make a new function that can accomplish this:
library(arules)
#> Loading required package: Matrix
#>
#> Attaching package: 'arules'
#> The following objects are masked from 'package:base':
#>
#> abbreviate, write
library(tidyverse)
df <- structure(list(dummy_column = c(0, 25, 77, 423, 6, 8, 3, 65,
32, 22, 10, 0, 8, 0, 15, 0, 10, 1, 2, 4, 5, 5, 6)), class = "data.frame", row.names = c(NA,
-23L))
discretize_keep <- function(vec, keep, ...) {
vec2 <- vec
vec2[vec2==keep] <- NA
dsc <- arules::discretize(vec2, ...)
fct_explicit_na(dsc, na_level = str_glue("[{keep}]"))
}
df %>%
mutate(discrete_column = discretize_keep(dummy_column, keep = 0, breaks = 3))
#> dummy_column discrete_column
#> 1 0 [0]
#> 2 25 [15,423]
#> 3 77 [15,423]
#> 4 423 [15,423]
#> 5 6 [6,15)
#> 6 8 [6,15)
#> 7 3 [1,6)
#> 8 65 [15,423]
#> 9 32 [15,423]
#> 10 22 [15,423]
#> 11 10 [6,15)
#> 12 0 [0]
#> 13 8 [6,15)
#> 14 0 [0]
#> 15 15 [15,423]
#> 16 0 [0]
#> 17 10 [6,15)
#> 18 1 [1,6)
#> 19 2 [1,6)
#> 20 4 [1,6)
#> 21 5 [1,6)
#> 22 5 [1,6)
#> 23 6 [6,15)