rdataframedplyrtibblesummarize

Combine (summarize) certain rows based on a category


I'm working with a simple dataframe in R, similar to this one:

data <- data.frame(
  "food"= c("Banana", "Orange", "Apple", "Meat", "Fish", "Cherries", "Wheat"),
  "kg"= c(2,3,1,2,6,4,5)
)

My point is that I would like to create a new row, called "Fruits" and having the kg value of "Banana", "Orange", "Apple" and "Cherries" combined (I would then delete those and just keep "Fruits").

The closer I've come to a solution has been this attempt:

library(tidyverse)
data <- data %>% 
  add_row(.data = data.frame(food="Fruits",
                             total=sum(data$kg[c=1,2,3,6])))

# Error message: 'incorrect number of dimensions'

I am quite new at R, so I don't know how to create a value that stores the addition of the rows I wanted and include it in a new row.


Solution

  • Let's say we have 2 categories (Fruits and Protein), while "Wheat" is not in either groups, hence, won't be "summarized" (if you only want "Fruits" to be summarized, you simply drop the Protein from categories list).

    library(tibble)
    library(tidyr)
    library(dplyr)
    
    categories <- list(Fruits = c("Banana", "Orange", "Apple", "Cherries"), 
                       Protein = c("Meat", "Fish"))
    
    enframe(categories) %>% 
      unnest(value) %>% 
      full_join(data, .,  join_by(food == value)) %>% 
      mutate(name = coalesce(name, food)) %>% 
      summarise(food = first(name), kg = sum(kg, na.rm = T), .by = name) %>% 
      select(-name)
    
    #>      food kg
    #> 1  Fruits 10
    #> 2 Protein  8
    #> 3   Wheat  5
    

    Created on 2024-03-12 with reprex v2.0.2