I have around 20-30 dbf files, which I imported in R. I cannot combine them together in one data frame/table because then the total file size comes around 2 GB. I want to create new columns in each file "avg_spends" grouping by age and ctg multiple columns in each of them.
When i combined the files into one data table and then executed the following command using dplyr.
file_combo <- dbf_file %>% group_by(ctg, age) %>% mutate(avg_spends =
mean(total_spend)
This is just the first step. Similarly I have to make new columns based on the previous columns available/created. How do i make this work by splitting the files by the 1st col- files1, files,2 etc.
I also need an output for each file separately
This is an example of the data that I have
files || age || ctg || total_spend
==================================
file1 || 45 || 1 || 1026
file1 || 26 || 2 || 1574
file1 || 45 || 1 || 64
file1 || 32 || 1 || 1610
file2 || 41 || 1 || 884
file2 || 22 || 1 || 530
file2 || 41 || 2 || 451
file2 || 22 || 1 || 520
file3 || 21 || 2 || 727
file3 || 34 || 1 || 562
file3 || 43 || 2 || 452
file3 || 23 || 1 || 851
You can achieve this by storing all of your files in a list and performing the action on the entire list with lapply()
, like so:
file1 <- data.frame(age = c(45,26,45,32), ctg = c(1,2,1,1), total_spend = c(1026, 1574, 64, 1610))
file2 <- data.frame(age = c(41,22,41,22), ctg = c(1,1,2,1), total_spend = c(884, 530, 451, 520))
file3 <- data.frame(age = c(21,34,43,23), ctg = c(2,1,2,1), total_spend = c(727, 562, 452, 851))
files <- list(file1, file2, file3)
result <- lapply(files, function(x) x %>% group_by(ctg, age) %>% mutate(avg_spends = mean(total_spend)))