[SOLVED] Separating untidy data in R

Separating untidy data in R - ODK cleanup

To tidy part of a dataset I need to separate one column into several. These data takes the form similar to this:

set.seed(2133)
df <- data.frame(a = paste(sample(1:9, 5, replace=T), sample(c("", "%2", "%3"), 5, replace=T),  sample(c("", "%3", "%4"), 5, replace=T), sep="")) 
df
      a
1     6
2 2%3%4
3   6%2
4   3%2
5 5%2%4

Tidyr's separate function wont do the job and the best idea I have is a series of ifelse statements, like this:

df$One <- ifelse(grepl("1", df$a) == T, 1, 0)
df$Two <- ifelse(grepl("2", df$a) == T, 1, 0)
          a One Two
1     6   0   0
2 2%3%4   0   1
3   6%2   0   1
4   3%2   0   1
5 5%2%4   0   1

What's the best way to go about such tidying. I'm sure many people working with Open Data Kit (ODK) for data collection will come across this.

Solution

We can loop the patterns (i.e. 1, 2) with sapply, get a logical matrix, coerce to binary by wrapping with + and assign it as new columns in 'df'

df[c("One", "Two")] <- +(sapply(1:2, grepl, df$a))
df
#     a One Two
#1     6   0   0
#2 2%3%4   0   1
#3   6%2   0   1
#4   3%2   0   1
#5 5%2%4   0   1

If this is to get the binary output for each unique value in 'a' column, we can split the strings in 'a' by the delimiter (%), the output list can be converted to data.frame using stack, transform the 'values' column to factor class by specifying the levels and do the table

table(transform(stack(setNames(strsplit(as.character(df$a), "[%]"),
       1:nrow(df))), values = factor(values, levels= 1:6))[2:1])

Or we can use a convenient function mtabulate from qdapTools after splitting.

library(qdapTools)
mtabulate(strsplit(as.character(df$a), "[%]"))