rsurvey

Trying to analyze the results from an open-answer question from a survey


I am currently learning how to perform data analysis in R Studio and I am using an SPSS database as an example. Currently I am having problems with the results of an open answer question where people had to write what region they come from. So now I have many cases where the same answer is written slightly different so they are perceived as being different although they refer to the same region.

Example:

x<- c("Bucharest", "ploiesti", "Focsani", 
      "bucharest", "sinaia", "Ploiești", "Sinaia", "BUCHAREST", "Bucharest", "Ploiesti")

table(x)

and the result, if I want to make a table would be:

> table(x)
x
bucharest Bucharest BUCHAREST   Focsani  ploiesti  Ploiesti  Ploiești 
        1         2         1         1         1         1         1 
   sinaia    Sinaia 
        1         1  

I'm not sure if this is the best example as my problem is for a variable/ column from a dataset but I hope that this helps.

I tried using the "str_to_title()" function from the "stringr" package but I get the following error:

Warning message:
In stri_trans_totitle(string, opts_brkiter = stri_opts_brkiter(locale = locale)) :
  argument is not an atomic vector; coercing

I want to find a way to make all the answers more uniform (ex: To turn all versions of "Bucharest" into a version with the same spelling that can be recognized as the same answer and do the same for the other answers) and then form a table where I can see how many times does each answer repeat.


Solution

  • x <- data.frame(region = c("Bucharest", "ploiesti", "Focsani", 
          "bucharest", "sinaia", "Ploiești", "Sinaia", "BUCHAREST", "Bucharest", "Ploiesti")) %>% 
      mutate(uniformName = str_to_title(region), 
             uniformName = str_replace(uniformName, 'ș', 's')) %>% 
      group_by(uniformName) %>% 
      summarise(count = n())
    

    sample