I have two columns in a data frame I want to summarize by region. I then want to add a totals row to the summary table, but the total is a count of the number rows in the summary table with a value > 0. I have to repeat this for several different data frames.
region <- c("R1", "R2", "R3", "R4", "R5", "R1", "R2", "R3", "R4", "R5")
housing <- c(0,1,1,1,1,0,1,0,0,1)
food <- c(1,1,0,1,1,0,1,0,0,1)
df <- data.frame(region,housing,food)
Using janitor's adorn_totals is close, but it sums the values for each column.
library(janitor)
df %>%
group_by(region) %>%
summarise_at(vars(housing:food), sum, na.rm = TRUE) %>%
adorn_totals(where = "row", name = "Not the total I want")
Nice, but I want to count the number of rows in each column that are greater than 0, like this:
I tried sum with ifelse which I couldn't get to work (like Excel countif). I can create a separate summary table and then bind_rows, but that's a lot of steps. Is there a way to use adorn_totals? Any suggestions are appreciated.
You could summarize
the sums first and bind
those results to a second summarize
result.
library(dplyr)
summarized_df <- df |>
group_by(region) |>
summarise(across(housing:food, \(x) sum(x, na.rm = TRUE)))
bind_rows(
summarized_df,
summarized_df |>
summarise(region = 'Count',
across(housing:food, \(x) sum(x > 0)))
)
#> # A tibble: 6 × 3
#> region housing food
#> <chr> <dbl> <dbl>
#> 1 R1 0 1
#> 2 R2 2 2
#> 3 R3 1 0
#> 4 R4 1 1
#> 5 R5 2 2
#> 6 Count 4 4