I feel like this is such a simple thing, but I'm having so much trouble wrapping my head around it for some reason.
Say I have a long-format data frame of two categorical variables like so:
df <- data.frame(
table_name = c("tbl1", "tbl1", "tbl1", "tbl2", "tbl2", "tbl2", "tbl3", "tbl3",
"tbl3"),
column_name = c("first name", "address", "city", "first name", "last name",
"ice cream", "last name", "potato", "ice cream")
)
which looks like this:
df
table_name column_name
tbl1 first name
tbl1 address
tbl1 city
tbl2 first name
tbl2 last name
tbl2 ice cream
tbl3 last name
tbl3 potato
tbl3 ice cream
I'd like to pivot_wider() to essentially create a matrix of matched column_name
between all combinations of table_name
. The expected output would be something like this:
tbl1 tbl2 tbl3
tbl1 3 1 0
tbl2 1 3 2
tbl3 0 2 3
If it helps at all, the use case here is that I want to build a network graph using igraph
. I've got a database that I want to visualize all the column_name
connections among table_name
.
We could self-join, count matches by table_name, and pivot_wider:
library(tidyverse)
inner_join(df, df, join_by(column_name)) |>
count(table_name.x, table_name.y) |>
pivot_wider(names_from = table_name.y, values_from = n, values_fill = 0)
# A tibble: 3 × 4
table_name.x tbl1 tbl2 tbl3
<chr> <int> <int> <int>
1 tbl1 3 1 0
2 tbl2 1 3 2
3 tbl3 0 2 3