rdata.tableprobabilitybayesianhidden-markov-models

Best possible way to add the likely event and its probability from a cross table in R


Using the mtcars dataset, I have created a cross table as follows -

tab = with(mtcars, ftable(gear, cyl))
tab

Here is how it looks -

     cyl  4  6  8
gear             
3         1  2 12
4         8  4  0
5         2  1  2

For this crosstable, I have calculated the row-wise probability

tab_prob = tab %>% prop.table(1) %>% round(4) * 100
tab_prob
     cyl     4     6     8
gear                      
3         6.67 13.33 80.00
4        66.67 33.33  0.00
5        40.00 20.00 40.00

I want to add two columns to the original mtcars dataset

  1. Column 1 cyl_exp - Fill in the expected outcome based on cross-table. For example, in mtcars dataset, if the number of gears is 3, this new column (refer to the tab cross table) should have the value 8, since there is 80% probability that if the number of gears is 3, then cyl should be 8.
  2. Column 2 cyl_prob - Write the probability from table tab_prob in this column based on the value in cyl_exp column.

Here is the expected outcome -

head(mtcars)
    mpg cyl disp  hp drat    wt  qsec vs am gear carb cyl_prob cyl_exp
1: 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4    66.67       4
2: 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4    66.67       4
3: 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1    66.67       4
4: 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1    80.00       8
5: 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2    80.00       8
6: 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1    80.00       8

Is there an easy way to accomplish this?

Thanks!


Solution

  • With data.table, I would do it this way:

    mtcars <- as.data.table(mtcars, keep.rownames = T)
    
    tab <- mtcars[, .N, by = .(gear, cyl)]
    tab[, prob := N/sum(N), by = .(gear)]
    tab <- tab[order(-prob, cyl)][!duplicated(gear)]
    mtcars[tab, `:=`(cyl_exp = i.cyl, cyl_prob = i.prob), on = .(gear)]
    
    # > head(mtcars)
    #                   rn  mpg cyl disp  hp drat    wt  qsec vs am gear carb cyl_exp  cyl_prob
    # 1:         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4       4 0.6666667
    # 2:     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4       4 0.6666667
    # 3:        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1       4 0.6666667
    # 4:    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1       8 0.8000000
    # 5: Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2       8 0.8000000
    # 6:           Valiant 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1       8 0.8000000