I have these many datasets : df as the main data frame (but let's imagine all of them as very big datasets)
df = data.frame(x = seq(1,20,2),
y = c('a','a','b','c','a','a','b','c','a','a'),
z = c('d','e','e','d','f','e','e','d','e','f') )
g = data.frame(xx = c(2,3,4,5,7,8,9) )
h = data.frame(xx = c(3,5,7,8,9) )
i = data.frame(xx = c(2,3,6,8) )
j = data.frame(xx = c(1,3,6) )
And I wish to make a group of tables of frequency to the y column of df using the xx of each other dataframe each time (xx is used to subset df).
And then making a group of tables of frequency to the Z column of df using the xx of each other dataframe each time (xx is used to subset df).
Next:
I would like to visualise the frequencies of each value for one variable to study its developpement:
for example: for variable y: the developpement of the value a going from g to j is: 2 2 1 2. I would like to visualise this developpement for each value of variable y in a simple way.
We could place the datasets in a list
(dplyr::lst
- returns a named list), loop over the list
with map
, subset the main dataset based on the 'x' column or do a inner_join
and get the frequency count
library(dplyr)
library(purrr)
map(lst(g, h, i,j),
~ inner_join(df, .x, by = c("x" = "xx")) %>%
count(y, name = 'Count'))
-output
$g
y Count
1 a 2
2 b 1
3 c 1
$h
y Count
1 a 2
2 b 1
3 c 1
$i
y Count
1 a 1
$j
y Count
1 a 2
Or in base R
lapply(list(g = g, h = h, i = i, j = j),
\(dat) subset(df, x %in% dat$xx, select = y ) |>
table())
If we need to visualize, either convert to a single dataset and then do the barplot with geom_col/geom_bar
or use barplot
in base R
library(ggplot2)
map_dfr(lst(g, h, i,j),
~ inner_join(df, .x, by = c("x" = "xx")) %>%
count(y, name = 'Count'), .id = 'grp') %>%
ggplot(aes(x = grp, y = Count, fill = y)) +
geom_col(position = "dodge")