I'm trying to process results from a Google Form in R and have hit a wall in dealing with string data.
The question can be seen here:
Google returns the results in a single column with a comma separating each response.
They end up looking like
ID | Type of Research
=====================
1 | Policy analysis, Review of other research
2 | Bla
3 | Review of other research, Original empirical research
4 | Policy analysis, Theoretical
5 | Review of other research
I've used grepl to create logical columns and a data.frame for the three pre-selected responses.
Private$ResearchTypeOriginal <- grepl("Original", Private$ResearchType)
Private$ResearchTypeReview <- grepl("Review", Private$ResearchType)
Private$ResearchTypePolicy <- grepl("Policy", Private$ResearchType)
ResearchTypeGrid <- data.frame(Private$ResearchTypeOriginal, Private$ResearchTypeReview, Private$ResearchTypePolicy)
This works great. However, I also need to pull out the "other"s. I was using
ResearchTypeOther <- subset(Private, !grepl("Original", Private$ResearchType) & !grepl("Review", Private$ResearchType) & !grepl("Policy", Private$ResearchType), select=c(ID, ResearchType, PubLang, Reviewer))
ResearchTypeOther <- na.omit(ResearchTypeOther)
but just realized that if a response has both a pre-selected response AND a open-ended one, that's lost using this method. It works fine for giving me the "Bla" responses, but only the ones that are exclusively "other."
In other words, this produces
ID | Type of Research
=======================
2 | Bla
But what I'd like is
ID | Type of Research
======================
2 | Bla
4 | Policy analysis, Theoretical
This is my first time posting on SO, and I'm obviously new at R, so please excuse any mistakes in how I'm asking the question. I'm sorry if I'm not phrasing this very well. I have ~20 other questions with the same problem, so I need a flexible solution.
Thanks for any help.
You could "regex your way through" in the veins of
doc <- readLines(n = 5)
1 | Policy analysis, Review of other research
2 | Bla
3 | Review of research, Original empirical research
4 | Policy analysis, Theoretical
5 | Review of other research
items <- c("Review of other research",
"Original empirical research",
"Policy analysis")
(others <- gsub(sprintf("(,\\s)?(%s)(,\\s)?", paste(items, collapse = "|")), "",
sub(".*\\|\\s(.*)", "\\1", doc)))
# [1] "" "Bla" "Review of research"
# [4] "Theoretical " ""
sub(sprintf("(,\\s)?(%s)(,\\s)?", paste(others[others != ""], collapse = "|")), "", doc)
# [1] "1 | Policy analysis, Review of other research"
# [2] "2 | "
# [3] "3 | Original empirical research"
# [4] "4 | Policy analysis"
# [5] "5 | Review of other research"