rggallyggnetwork

Re-shaping data for plotting a network


I have a data set that I would like to re-shape for plotting as a network (following the work done here). The initial data frame looks like this:

authors <- c('Author A', 'Author B', 'Author C', 
             'Author A', 'Author D', 'Author C')
affiliation <- c('University 1', 'University 2', 'University 1', 
                 'University 1', 'Institute 3', 'University 1')
manuscript <- c('Manuscript A', 'Manuscript A', 'Manuscript A', 
                'Manuscript B', 'Manuscript B', 'Manuscript B')
df <- data.frame(authors, affiliation, manuscript)

I would like to re-shape this so that for each manuscript, I can get every combination of authors with the primary author's affiliation (I hope the way I am asking this question makes sense). This would result in the following data frame:

df_network <- data.frame('primary_author'= c('Author A', 'Author A', 
                                             'Author B', 'Author B', 
                                             'Author C', 'Author C', 
                                             'Author A','Author A', 
                                             'Author D', 'Author D', 
                                             'Author C', 'Author C'),
                         'connection'= c('Author B', 'Author C', 
                                         'Author A', 'Author C', 
                                         'Author A', 'Author B', 
                                         'Author D', 'Author C', 
                                         'Author A', 'Author C', 
                                         'Author A', 'Author D'),
                         'primary_affiliation' = c('University 1', 'University 1',
                                                   'University 2', 'University 2',
                                                   'University 1', 'University 1',
                                                   'University 1', 'University 1',
                                                   'Institute 3', 'Institute 3',
                                                   'University 1', 'University 1'),
                         'manuscript' = c('Manuscript A', 'Manuscript A',
                                          'Manuscript A', 'Manuscript A',
                                          'Manuscript A', 'Manuscript A',
                                          'Manuscript B', 'Manuscript B',
                                          'Manuscript B', 'Manuscript B',
                                          'Manuscript B', 'Manuscript B'))

Of course I can re-shape the data by hand but this is incredibly tedious, especially as the list gets very long. I have done this before (manually), and if I can get the data in the shape of df_network then the result is quite nice. Any tips or tricks anyone could offer would be greatly appreciated.


Solution

  • Try this:

    library(dplyr)
    
    df %>% 
      left_join(df, by = "manuscript") %>% 
      filter(!authors.x == authors.y) %>% 
      select(primary_author = authors.x, 
             connection = authors.y, 
             primary_affiliation = affiliation.x, 
             manuscript)
    

    Output:

       primary_author connection primary_affiliation   manuscript
    1        Author A   Author B        University 1 Manuscript A
    2        Author A   Author C        University 1 Manuscript A
    3        Author B   Author A        University 2 Manuscript A
    4        Author B   Author C        University 2 Manuscript A
    5        Author C   Author A        University 1 Manuscript A
    6        Author C   Author B        University 1 Manuscript A
    7        Author A   Author D        University 1 Manuscript B
    8        Author A   Author C        University 1 Manuscript B
    9        Author D   Author A         Institute 3 Manuscript B
    10       Author D   Author C         Institute 3 Manuscript B
    11       Author C   Author A        University 1 Manuscript B
    12       Author C   Author D        University 1 Manuscript B