rdplyrtibble

cut() in R does not work: "x" must be numeric, but it is


I have a table with timestamp and 9 columns filled with Numbers. Default, they were characters so I used:

library(dplyr)

data[-1] <- data[-1] %>% mutate_all(~as.numeric(as.character(.)))

Which worked just fine without any error. Now i want to group them and turn each value into "-", "o", "+" but it doesn't work, somehow...

for(j in 2:10) data[,j] <- cut(data[,j], c(-4,-0.5,0.5,4), labels= c("-","o","+"))

The error that appears (I translated myself, original in german) "Fehler in cut.default(aggr_beer1[, j], c(-4, -0.5, 0.5, 4), labels = c("-", : 'x' muss numerisch sein" "Error in cut.default(aggr_beer1[, j], c(-4, -0.5, 0.5, 4), labels = c("-", : 'x' must be numeric"

Can someone please help me? What am I doint wrong?


Solution

  • Notice the difference between

    library(dplyr)
    
    data[,2]
    # A tibble: 5 × 1
          a
      <dbl>
    1    -3
    2    -2
    3    -1
    4     0
    5     1
    

    and

    as.data.frame(data)[,2]
    [1] -3 -2 -1  0  1
    

    Subsetting a tibble by column (data[,X]) returns a tibble but cut needs a numeric vector. A data.frame drops the data frame structure when subsetting by only one column. As a side note, to keep the data frame use drop=F, e.g. as.data.frame(data)[,2,drop=F]

    Another way to approach this, if this is your data

    data <- structure(list(grp = c("A", "B", "C", "D", "E"), a = c("-3", 
    "-2", "-1", "0", "1"), b = c("2", "3", "4", "5", "6")), row.names = c(NA, 
    -5L), class = c("tbl_df", "tbl", "data.frame"))
    
    data
    # A tibble: 5 × 3
      grp   a     b    
      <chr> <chr> <chr>
    1 A     -3    2    
    2 B     -2    3    
    3 C     -1    4    
    4 D     0     5    
    5 E     1     6
    

    Get the desired result with across

    library(dplyr)
    
    data %>% 
      mutate(across(a:b, ~ 
               cut(as.numeric(.x), c(-4,-0.5,0.5,4), labels= c("-","o","+"))))
    # A tibble: 5 × 3
      grp   a     b    
      <chr> <fct> <fct>
    1 A     -     +    
    2 B     -     +    
    3 C     -     +    
    4 D     o     NA   
    5 E     +     NA 
    

    Note that mutate_all has been superseded by across, see ?mutate_all.

    Use for(j in 2:3) data[,j] <- cut(data[,j,drop=T], c(-4,-0.5,0.5,4), labels= c("-","o","+")) to get your approach working.