rcountcrosstabcrosstable

How to create a cross tabulation table between two variables with the counts in R?


I have a data frame with two columns that I want to cross tabulate. The data also includes the counts for the combination. I am trying to create the cross table and include those counts within the table. I am struggling to use the counts from the dataframe into the cross table.

> df %>% arrange(d1)%>%  head()
  count d1 d2
1     3  1 15
2    86  1 14
3    13  1 12
4   186  1 16
5    29  1  9
6    86  1 13


> table(df$d1,df$d2)
    
     1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
  1  0 1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1
  2  1 0 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1
  3  1 1 0 1 1 1 1 1 1  1  1  1  1  1  1  1  1
  

Expecting [1,15] and [1,14] to show 3, 86 based on the counts in df table. Right now it shows 0s and 1s only based on if the combinations exists.

Here is my sample data:

structure(list(count = c(37L, 6L, 44L, 21L, 8L, 3L, 9L, 17L, 
13L, 32L, 106L, 34L, 505L, 173L, 12L, 2L, 4L, 45L, 3L, 43L, 5L, 
16L, 1L, 27L, 17L, 3L, 4L, 1L, 27L, 86L, 79L, 10L, 161L, 32L, 
3L, 209L, 9L, 83L, 23L, 108L, 161L, 22L, 4L, 16L, 2L, 6L, 67L, 
86L, 3L, 1L, 14L, 14L, 111L, 5L, 5L, 44L, 105L, 13L, 269L, 186L, 
3L, 5L, 5L, 27L, 3L, 186L, 58L, 29L, 34L, 43L, 8L, 92L, 9L, 455L, 
22L, 32L, 4L, 14L, 58L, 22L, 190L, 94L, 27L, 152L, 264L, 36L, 
1L, 505L, 86L, 44L, 3L, 1L, 79L, 75L, 12L, 32L, 11L, 197L, 90L, 
269L, 9L, 6L, 47L, 14L, 158L, 303L, 335L, 37L, 33L, 3L, 83L, 
15L, 31L, 124L, 146L, 26L, 36L, 27L, 37L, 31L, 108L, 121L, 111L, 
11L, 5L, 26L, 166L, 11L, 18L, 11L, 8L, 15L, 18L, 165L, 80L, 14L, 
5L, 3L, 492L, 7L, 90L, 146L, 130L, 197L, 165L, 34L, 22L, 122L, 
29L, 74L, 455L, 303L, 45L, 5L, 173L, 33L, 24L, 229L, 79L, 43L, 
68L, 16L, 10L, 73L, 35L, 99L, 229L, 94L, 23L, 492L, 18L, 84L, 
92L, 86L, 35L, 31L, 1L, 23L, 8L, 121L, 1L, 173L, 400L, 124L, 
20L, 11L, 6L, 3L, 166L, 84L, 31L, 122L, 15L, 24L, 70L, 43L, 74L, 
209L, 45L, 158L, 44L, 15L, 37L, 35L, 27L, 68L, 20L, 15L, 11L, 
21L, 4L, 18L, 44L, 234L, 80L, 10L, 44L, 4L, 47L, 7L, 67L, 10L, 
3L, 173L, 99L, 79L, 130L, 3L, 75L, 1L, 335L, 14L, 106L, 15L, 
34L, 190L, 152L, 16L, 73L, 45L, 1L, 3L, 264L, 160L, 23L, 1L, 
160L, 400L, 105L, 234L, 70L, 35L), d1 = c(10L, 17L, 5L, 3L, 12L, 
1L, 10L, 10L, 12L, 7L, 14L, 6L, 16L, 3L, 7L, 9L, 7L, 13L, 4L, 
8L, 9L, 2L, 7L, 16L, 8L, 15L, 12L, 12L, 2L, 1L, 16L, 15L, 14L, 
5L, 8L, 14L, 11L, 11L, 4L, 4L, 13L, 7L, 12L, 11L, 17L, 8L, 4L, 
13L, 15L, 15L, 12L, 13L, 4L, 5L, 5L, 5L, 2L, 1L, 2L, 1L, 2L, 
13L, 12L, 5L, 3L, 16L, 10L, 1L, 14L, 2L, 7L, 9L, 15L, 16L, 3L, 
11L, 8L, 12L, 9L, 9L, 14L, 11L, 8L, 11L, 16L, 10L, 17L, 6L, 1L, 
3L, 5L, 1L, 3L, 11L, 10L, 14L, 5L, 3L, 6L, 16L, 15L, 15L, 4L, 
14L, 14L, 16L, 16L, 8L, 3L, 7L, 1L, 15L, 6L, 11L, 6L, 5L, 1L, 
15L, 2L, 7L, 14L, 2L, 13L, 10L, 6L, 1L, 3L, 15L, 2L, 3L, 9L, 
7L, 11L, 3L, 10L, 16L, 17L, 7L, 3L, 15L, 1L, 2L, 10L, 13L, 4L, 
5L, 8L, 4L, 9L, 16L, 13L, 4L, 10L, 17L, 6L, 8L, 7L, 11L, 8L, 
9L, 16L, 7L, 14L, 9L, 4L, 3L, 13L, 4L, 8L, 16L, 8L, 6L, 14L, 
14L, 9L, 13L, 17L, 12L, 10L, 1L, 17L, 11L, 16L, 2L, 1L, 7L, 14L, 
12L, 2L, 9L, 8L, 6L, 4L, 13L, 9L, 6L, 5L, 6L, 12L, 11L, 4L, 2L, 
14L, 12L, 11L, 7L, 8L, 6L, 1L, 12L, 9L, 12L, 5L, 3L, 6L, 15L, 
13L, 8L, 10L, 4L, 1L, 13L, 17L, 13L, 1L, 10L, 14L, 17L, 9L, 2L, 
10L, 17L, 2L, 12L, 5L, 3L, 6L, 7L, 3L, 16L, 15L, 5L, 9L, 2L, 
6L, 5L, 13L, 11L, 4L, 6L, 13L, 4L), d2 = c(2L, 14L, 4L, 12L, 
10L, 15L, 15L, 8L, 1L, 14L, 2L, 5L, 6L, 11L, 10L, 17L, 8L, 10L, 
17L, 6L, 5L, 7L, 15L, 15L, 10L, 1L, 9L, 17L, 5L, 14L, 8L, 14L, 
13L, 11L, 5L, 6L, 15L, 1L, 8L, 14L, 14L, 3L, 8L, 7L, 9L, 15L, 
1L, 1L, 2L, 5L, 13L, 12L, 13L, 12L, 9L, 3L, 4L, 12L, 16L, 16L, 
15L, 17L, 5L, 2L, 17L, 1L, 9L, 9L, 5L, 9L, 9L, 14L, 11L, 13L, 
7L, 5L, 12L, 14L, 10L, 8L, 3L, 4L, 11L, 6L, 9L, 1L, 1L, 16L, 
13L, 5L, 8L, 17L, 10L, 9L, 7L, 7L, 10L, 13L, 1L, 2L, 10L, 8L, 
10L, 12L, 11L, 4L, 10L, 14L, 8L, 12L, 11L, 6L, 7L, 2L, 2L, 1L, 
10L, 16L, 10L, 6L, 4L, 1L, 4L, 5L, 17L, 5L, 2L, 3L, 8L, 15L, 
7L, 4L, 12L, 4L, 6L, 17L, 6L, 5L, 16L, 4L, 6L, 6L, 14L, 3L, 3L, 
14L, 9L, 6L, 1L, 5L, 16L, 16L, 13L, 13L, 13L, 3L, 13L, 13L, 16L, 
2L, 7L, 2L, 15L, 3L, 12L, 1L, 11L, 11L, 4L, 3L, 2L, 9L, 9L, 1L, 
4L, 8L, 12L, 6L, 12L, 2L, 2L, 3L, 11L, 11L, 8L, 1L, 17L, 7L, 
3L, 6L, 13L, 4L, 7L, 7L, 13L, 8L, 16L, 14L, 16L, 14L, 5L, 12L, 
8L, 4L, 8L, 16L, 1L, 15L, 7L, 3L, 12L, 11L, 13L, 6L, 10L, 13L, 
5L, 7L, 4L, 15L, 4L, 15L, 4L, 6L, 3L, 3L, 10L, 3L, 11L, 17L, 
16L, 16L, 14L, 2L, 6L, 14L, 11L, 11L, 9L, 12L, 7L, 7L, 16L, 13L, 
12L, 15L, 2L, 16L, 2L, 3L, 9L, 9L)), row.names = c(NA, 252L), class = "data.frame")

Solution

  • xtabs may be useful here

    > xtabs(count ~ d1 + d2, df)
        d2
    d1     1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
      1    0 121  99  67  26  90  11  20  29  36  83  13  86  86   3 186   1
      2  121   0 166 105  27 146  16  18  43  37 124  15 160 106   3 269   1
      3   99 166   0 165  44 234  22  33  73  79 173  21 197 190  11 492   3
      4   67 105 165   0  44 122  15  23  35  47  94  35 111 108   7 303   3
      5   26  27  44  44   0  34   3   3   5  11  32   5  44  34   1  74   0
      6   90 146 234 122  34   0  31  43  84  80 152  23 173 209  15 505   5
      7   11  16  22  15   3  31   0   4   8  12  16   3  24  32   1  68   0
      8   20  18  33  23   3  43   4   0  22  17  27   4  31  37   6  79   0
      9   29  43  73  35   5  84   8  22   0  58  75   4  70  92   0 264   2
      10  36  37  79  47  11  80  12  17  58   0   0   8  45 130   9 335   0
      11  83 124 173  94  32 152  16  27  75   0   0  18 229 158   9 400   0
      12  13  15  21  35   5  23   3   4   4   8  18   0  14  14   0  45   1
      13  86 160 197 111  44 173  24  31  70  45 229  14   0 161  10 455   5
      14  86 106 190 108  34 209  32  37  92 130 158  14 161   0  10   0   6
      15   3   3  11   7   1  15   1   6   0   9   9   0  10  10   0  27   0
      16 186 269 492 303  74 505  68  79 264 335 400  45 455   0  27   0  14
      17   1   1   3   3   0   5   0   0   2   0   0   1   5   6   0  14   0
    

    Convert to data.frame if required

    as.data.frame.matrix(xtabs(count ~ d1 + d2, df)) 
       1   2   3   4  5   6  7  8   9  10  11 12  13  14 15  16 17
    1    0 121  99  67 26  90 11 20  29  36  83 13  86  86  3 186  1
    2  121   0 166 105 27 146 16 18  43  37 124 15 160 106  3 269  1
    3   99 166   0 165 44 234 22 33  73  79 173 21 197 190 11 492  3
    4   67 105 165   0 44 122 15 23  35  47  94 35 111 108  7 303  3
    5   26  27  44  44  0  34  3  3   5  11  32  5  44  34  1  74  0
    6   90 146 234 122 34   0 31 43  84  80 152 23 173 209 15 505  5
    7   11  16  22  15  3  31  0  4   8  12  16  3  24  32  1  68  0
    8   20  18  33  23  3  43  4  0  22  17  27  4  31  37  6  79  0
    9   29  43  73  35  5  84  8 22   0  58  75  4  70  92  0 264  2
    10  36  37  79  47 11  80 12 17  58   0   0  8  45 130  9 335  0
    11  83 124 173  94 32 152 16 27  75   0   0 18 229 158  9 400  0
    12  13  15  21  35  5  23  3  4   4   8  18  0  14  14  0  45  1
    13  86 160 197 111 44 173 24 31  70  45 229 14   0 161 10 455  5
    14  86 106 190 108 34 209 32 37  92 130 158 14 161   0 10   0  6
    15   3   3  11   7  1  15  1  6   0   9   9  0  10  10  0  27  0
    16 186 269 492 303 74 505 68 79 264 335 400 45 455   0 27   0 14
    17   1   1   3   3  0   5  0  0   2   0   0  1   5   6  0  14  0
    

    Or may use dcast

    library(data.table)
     dcast(df, d1 ~ d2, value.var = 'count')
    Key: <d1>
           d1     1     2     3     4     5     6     7     8     9    10    11    12    13    14    15    16    17
        <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
     1:     1    NA   121    99    67    26    90    11    20    29    36    83    13    86    86     3   186     1
     2:     2   121    NA   166   105    27   146    16    18    43    37   124    15   160   106     3   269     1
     3:     3    99   166    NA   165    44   234    22    33    73    79   173    21   197   190    11   492     3
     4:     4    67   105   165    NA    44   122    15    23    35    47    94    35   111   108     7   303     3
     5:     5    26    27    44    44    NA    34     3     3     5    11    32     5    44    34     1    74    NA
     6:     6    90   146   234   122    34    NA    31    43    84    80   152    23   173   209    15   505     5
     7:     7    11    16    22    15     3    31    NA     4     8    12    16     3    24    32     1    68    NA
     8:     8    20    18    33    23     3    43     4    NA    22    17    27     4    31    37     6    79    NA
     9:     9    29    43    73    35     5    84     8    22    NA    58    75     4    70    92    NA   264     2
    10:    10    36    37    79    47    11    80    12    17    58    NA    NA     8    45   130     9   335    NA
    11:    11    83   124   173    94    32   152    16    27    75    NA    NA    18   229   158     9   400    NA
    12:    12    13    15    21    35     5    23     3     4     4     8    18    NA    14    14    NA    45     1
    13:    13    86   160   197   111    44   173    24    31    70    45   229    14    NA   161    10   455     5
    14:    14    86   106   190   108    34   209    32    37    92   130   158    14   161    NA    10    NA     6
    15:    15     3     3    11     7     1    15     1     6    NA     9     9    NA    10    10    NA    27    NA
    16:    16   186   269   492   303    74   505    68    79   264   335   400    45   455    NA    27    NA    14
    17:    17     1     1     3     3    NA     5    NA    NA     2    NA    NA     1     5     6    NA    14    NA