This follows on from a question I just posted here (providing link for context): R - Identify and remove duplicate rows based on two columns
The next thing I need to do is count combinations of values based on the Text_ID
column. Here's a sample of what my data looks like:
Text_ID Course_Code
39 MA3020
39 MA3120
59 MA3006
59 MA5902
89 MA2105
89 MA3006
89 MA5902
92 MA3023
92 MA3024
94 MA2023
94 MA3023
94 MA3024
97 MA3023
97 MA3024
To be clear, what I'm trying to ascertain is how many times two Course_Code
share the same Text_ID
. I imagine there are a few ways to approach this and/or present the data, but here's how it might look (FYI - I'm trying to get this data into a structure that will allow me to create a chord diagram, showing relationships between Course_Code
):
From To Value
MA3020 MA3120 1
MA3006 MA5092 2
MA2105 MA3006 1
MA3023 MA3024 3
MA2023 MA3023 1
As you can see, MA3023 and MA3024 have the most Text_ID
in common (3).
It gets a bit complicated (I think) because more than two course codes can share a Text_ID
. E.g. Text_ID
89 and 94 all appear on three or more different Course_Code
Hopefully that's all clear. If not, happy to elaborate. Ultimately, my goal is to get my data into a format/structure that will allow me to visualise relationships between Course_Code
using Text_ID
as the shared value. If there's another way to approach this, feel free to suggest it :)
We may use
subset(as.data.frame.table(crossprod(table(df1))), Freq != 0)