rstringdplyrfactors

In R, how can I convert string to numbers, and then to factors?


I have a dataset where there are text responses from multiple surveys. The responses were done using a Likert scale but the text wasn't standardized. For example:

#create df
df<- data.frame(
id = c('person1','person2','person3'),
category = c('I am 0-10 years old', 'I am 11-20 years old', 'I am between 21-30 years old'),
Q1.do.you.feel.tired.everyday = c('no, never', 'yes, sometimes', 'yes some-times'))

Question 1: how do I mutate the string 'yes, some-times' to 'yes, sometimes'

Question 2: how can I change the text for my category column? I want to get rid of the word "between" so how can I change 'I am between 21-30 years old' to be 'I am 21-30 years old'

I wanted to make the answers to Q1 factors so I used: df<- mutate(df, across(where(is.character), as.factor))

However, 'yes, sometimes' and 'yes some-times' appear as two different levels. So that column is a factor with 3 levels, rather than 2.


Solution

  • library(dplyr)
    
    df |> 
      mutate(category = gsub("between ", "", category, fixed = TRUE),
             Q1.do.you.feel.tired.everyday = ifelse(Q1.do.you.feel.tired.everyday == "yes some-times", "yes, sometimes", Q1.do.you.feel.tired.everyday),
             across(where(is.character), factor))
    #        id             category Q1.do.you.feel.tired.everyday
    # 1 person1  I am 0-10 years old                     no, never
    # 2 person2 I am 11-20 years old                yes, sometimes
    # 3 person3 I am 21-30 years old                yes, sometimes