[SOLVED] R dplyr solution to pivot categorical matches wider

R dplyr solution to pivot categorical matches wider

I feel like this is such a simple thing, but I'm having so much trouble wrapping my head around it for some reason.

Say I have a long-format data frame of two categorical variables like so:

df <- data.frame(
  table_name = c("tbl1", "tbl1", "tbl1", "tbl2", "tbl2", "tbl2", "tbl3", "tbl3",
                 "tbl3"),
  column_name = c("first name", "address", "city", "first name", "last name",
                  "ice cream", "last name", "potato", "ice cream")
)

which looks like this:

df

table_name  column_name
tbl1        first name          
tbl1        address         
tbl1        city            
tbl2        first name          
tbl2        last name           
tbl2        ice cream           
tbl3        last name           
tbl3        potato          
tbl3        ice cream

I'd like to pivot_wider() to essentially create a matrix of matched column_name between all combinations of table_name. The expected output would be something like this:

     tbl1 tbl2 tbl3
tbl1    3    1    0
tbl2    1    3    2
tbl3    0    2    3

If it helps at all, the use case here is that I want to build a network graph using igraph. I've got a database that I want to visualize all the column_name connections among table_name.

Solution

We could self-join, count matches by table_name, and pivot_wider:

library(tidyverse)
inner_join(df, df, join_by(column_name)) |>
  count(table_name.x, table_name.y) |>
  pivot_wider(names_from = table_name.y, values_from = n, values_fill = 0)

# A tibble: 3 × 4
  table_name.x  tbl1  tbl2  tbl3
  <chr>        <int> <int> <int>
1 tbl1             3     1     0
2 tbl2             1     3     2
3 tbl3             0     2     3