rduplicatesuniquechord-diagram

R - Count combinations of values across rows (to make a chord diagram)


This follows on from a question I just posted here (providing link for context): R - Identify and remove duplicate rows based on two columns

The next thing I need to do is count combinations of values based on the Text_ID column. Here's a sample of what my data looks like:

Text_ID  Course_Code
39       MA3020
39       MA3120
59       MA3006
59       MA5902
89       MA2105
89       MA3006
89       MA5902
92       MA3023
92       MA3024
94       MA2023
94       MA3023
94       MA3024
97       MA3023
97       MA3024

To be clear, what I'm trying to ascertain is how many times two Course_Code share the same Text_ID. I imagine there are a few ways to approach this and/or present the data, but here's how it might look (FYI - I'm trying to get this data into a structure that will allow me to create a chord diagram, showing relationships between Course_Code):

From     To      Value
MA3020   MA3120  1
MA3006   MA5092  2
MA2105   MA3006  1
MA3023   MA3024  3
MA2023   MA3023  1

As you can see, MA3023 and MA3024 have the most Text_ID in common (3).

It gets a bit complicated (I think) because more than two course codes can share a Text_ID. E.g. Text_ID 89 and 94 all appear on three or more different Course_Code

Hopefully that's all clear. If not, happy to elaborate. Ultimately, my goal is to get my data into a format/structure that will allow me to visualise relationships between Course_Code using Text_ID as the shared value. If there's another way to approach this, feel free to suggest it :)


Solution

  • We may use

    subset(as.data.frame.table(crossprod(table(df1))), Freq != 0)