rdataframesubset

How to make a table function on a dataframe based on a series of subsets - and then visualise specific values of the resulted tables?


I have these many datasets : df as the main data frame (but let's imagine all of them as very big datasets)

df = data.frame(x = seq(1,20,2),
y = c('a','a','b','c','a','a','b','c','a','a'),
z = c('d','e','e','d','f','e','e','d','e','f') )

g = data.frame(xx = c(2,3,4,5,7,8,9) )

h = data.frame(xx = c(3,5,7,8,9) )

i = data.frame(xx = c(2,3,6,8) )

j = data.frame(xx = c(1,3,6) )

And I wish to make a group of tables of frequency to the y column of df using the xx of each other dataframe each time (xx is used to subset df).

And then making a group of tables of frequency to the Z column of df using the xx of each other dataframe each time (xx is used to subset df).

Next:

I would like to visualise the frequencies of each value for one variable to study its developpement:

for example: for variable y: the developpement of the value a going from g to j is: 2 2 1 2. I would like to visualise this developpement for each value of variable y in a simple way.


Solution

  • We could place the datasets in a list (dplyr::lst- returns a named list), loop over the list with map, subset the main dataset based on the 'x' column or do a inner_join and get the frequency count

    library(dplyr)
    library(purrr)
    map(lst(g, h, i,j), 
       ~ inner_join(df, .x, by = c("x" = "xx")) %>%      
           count(y, name = 'Count'))
    

    -output

    $g
      y Count
    1 a     2
    2 b     1
    3 c     1
    
    $h
      y Count
    1 a     2
    2 b     1
    3 c     1
    
    $i
      y Count
    1 a     1
    
    $j
      y Count
    1 a     2
    

    Or in base R

    lapply(list(g = g, h = h, i = i, j = j),
      \(dat) subset(df, x %in% dat$xx, select = y ) |>
          table())
    

    If we need to visualize, either convert to a single dataset and then do the barplot with geom_col/geom_bar or use barplot in base R

    library(ggplot2)
    map_dfr(lst(g, h, i,j), 
       ~ inner_join(df, .x, by = c("x" = "xx")) %>%      
           count(y, name = 'Count'), .id = 'grp') %>% 
      ggplot(aes(x = grp, y = Count, fill = y)) +
        geom_col(position = "dodge")