rstringgoogle-sheetsgoogle-formsgrepl

Need help processing multiple response strings from a google form using R


I'm trying to process results from a Google Form in R and have hit a wall in dealing with string data.

The question can be seen here:

enter image description here

Google returns the results in a single column with a comma separating each response.

They end up looking like

ID | Type of Research
=====================
1  | Policy analysis, Review of other research
2  | Bla
3  | Review of other research, Original empirical research
4  | Policy analysis, Theoretical 
5  | Review of other research

I've used grepl to create logical columns and a data.frame for the three pre-selected responses.

Private$ResearchTypeOriginal <- grepl("Original", Private$ResearchType)
Private$ResearchTypeReview <- grepl("Review", Private$ResearchType)
Private$ResearchTypePolicy <- grepl("Policy", Private$ResearchType)

ResearchTypeGrid <- data.frame(Private$ResearchTypeOriginal, Private$ResearchTypeReview, Private$ResearchTypePolicy)

This works great. However, I also need to pull out the "other"s. I was using

ResearchTypeOther <- subset(Private, !grepl("Original", Private$ResearchType) & !grepl("Review", Private$ResearchType) & !grepl("Policy", Private$ResearchType), select=c(ID, ResearchType, PubLang, Reviewer))
ResearchTypeOther <- na.omit(ResearchTypeOther)

but just realized that if a response has both a pre-selected response AND a open-ended one, that's lost using this method. It works fine for giving me the "Bla" responses, but only the ones that are exclusively "other."

In other words, this produces

ID |  Type of Research
=======================
2  |  Bla 

But what I'd like is

ID |  Type of Research
======================
2  |  Bla
4  |  Policy analysis, Theoretical

This is my first time posting on SO, and I'm obviously new at R, so please excuse any mistakes in how I'm asking the question. I'm sorry if I'm not phrasing this very well. I have ~20 other questions with the same problem, so I need a flexible solution.

Thanks for any help.


Solution

  • You could "regex your way through" in the veins of

    doc <- readLines(n = 5)
    1  | Policy analysis, Review of other research
    2  | Bla
    3  | Review of research, Original empirical research
    4  | Policy analysis, Theoretical 
    5  | Review of other research
    
    items <- c("Review of other research", 
               "Original empirical research", 
               "Policy analysis")
    (others <- gsub(sprintf("(,\\s)?(%s)(,\\s)?", paste(items, collapse = "|")), "", 
               sub(".*\\|\\s(.*)", "\\1", doc)))
    # [1] ""                   "Bla"                "Review of research"
    # [4] "Theoretical "       ""  
    
    
    sub(sprintf("(,\\s)?(%s)(,\\s)?", paste(others[others != ""], collapse = "|")), "", doc)
    # [1] "1  | Policy analysis, Review of other research"
    # [2] "2  | "                                         
    # [3] "3  | Original empirical research"              
    # [4] "4  | Policy analysis"                          
    # [5] "5  | Review of other research"