I am hoping to use the riverplot package to create a flow diagram. This package needs 'edges' which are flows between levels. I want to create an edges data structure from a data frame. By way of example here is some code to create my input data.
rp.df<-structure(list(ID = 1:20, X1 = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "A1", class = "factor"),
X2 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A2",
"B2"), class = "factor"), X3 = structure(c(1L, 1L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
3L), .Label = c("A3", "B3", "C3"), class = "factor")), class = "data.frame", row.names = c(NA,
-20L))
table(rp.df$X1,rp.df$X2)
table(rp.df$X2,rp.df$X3)
with this output
> table(rp.df$X1,rp.df$X2)
A2 B2
A1 12 8
> table(rp.df$X2,rp.df$X3)
A3 B3 C3
A2 2 2 8
B2 5 2 1
what I need is a dataframe with the flows identified in the tables, eg:
N1 N2 Value
A1 A2 12
A1 B2 8
A2 A3 2
A2 B3 2
A2 C3 8
B2 A3 5
B2 B3 2
B2 C3 1
In reality I have 10 columns of edges and 16k in flows. I have tried using reshape2 to do this but struggled.
Here's a base R solution, generalized for however many columns you have.
out <- lapply(2:(ncol(rp.df) - 1), function(i) {
as.data.frame(table(rp.df[, i], rp.df[, i + 1]))
}
)
setNames(do.call(rbind, out), c("N1", "N2", "Value"))
# N1 N2 Value
# 1 A1 A2 12
# 2 A1 B2 8
# 3 A2 A3 2
# 4 B2 A3 5
# 5 A2 B3 2
# 6 B2 B3 2
# 7 A2 C3 8
# 8 B2 C3 1