I have a dataframe with a list of (space-separated) years that I would like to turn into dummies for each year.
Consider the following toy data:
raw <- data.frame(textcol = c("case1", "case2", "case3"), years=c('1996 1997 1998','1997 1999 2000', '1996 1998 2000'))
textcol years
1 case1 1996 1997 1998
2 case2 1997 1999 2000
3 case3 1996 1998 2000
I would now like to transform the data frame into this
textcol `1996` `1997` `1998` `1999` `2000`
1 case1 1 1 1 0 0
2 case2 0 1 0 1 1
3 case3 1 0 1 0 1
I tried using separate()
and str_split()
to no avail. Can someone point me to the right approach?
Use separate_rows
to get each year in a separate row and then use table
. (Append %>% as.data.frame.matrix
to the pipeline if you want it as a data frame.)
library(tidyr)
tab <- raw %>% separate_rows(years) %>% table
giving:
tab
## years
## textcol 1996 1997 1998 1999 2000
## case1 1 1 1 0 0
## case2 0 1 0 1 1
## case3 1 0 1 0 1
We can display this as a graph. Convert tab to an igraph, g. Then create a custom layout, lay, to display the vertices in order as the usual bipartite layout in igraph tries to reorder them to minimize crossings. Finally plot it.
library(igraph)
g <- graph_from_incidence_matrix(tab)
lay <- with(as.data.frame(layout_as_bipartite(g)),
cbind(ave(V1, V2, FUN = sort), V2))
plot(g, layout = lay, vertex.size = 2)