rgroup-bysummarize

Count the values of a column based on the values of another column in R


Using the dataframe I want to create a new one which will contain Zip, Name and a column named Count which will include the count of Name per Zip.

Zip<-c("123245","12345","123245","123456","123456","12345")
Name<-c("Bob","Bob","Bob","Jack","Jack","Mary"),
df<-data.frame(Zip,Name,Name2)

library(dplyr)
df %>%
  group_by(Zip) %>%
  mutate(Name = cumsum(Name))

expected

Zip Name Count
1 123245  Bob     2
2  12345  Bob     1
3  12345 Mary     1
4 123456 Jack     2

Solution

  • Does this solve your problem?

    Zip<-c("123245","12345","123245","123456","123456","12345")
    Name<-c("Bob","Bob","Bob","Jack","Jack","Mary")
    df<-data.frame(Zip,Name)
    
    library(dplyr)
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    df %>%
      group_by(Zip, Name) %>%
      summarise(Count = n())
    #> `summarise()` has grouped output by 'Zip'. You can override using the `.groups` argument.
    #> # A tibble: 4 × 3
    #> # Groups:   Zip [3]
    #>   Zip    Name  Count
    #>   <chr>  <chr> <int>
    #> 1 123245 Bob       2
    #> 2 12345  Bob       1
    #> 3 12345  Mary      1
    #> 4 123456 Jack      2
    

    Created on 2021-12-22 by the reprex package (v2.0.1)

    --

    Quick speed benchmark:

    library(tidyverse)
    library(microbenchmark)
    
    Zip<-c("123245","12345","123245","123456","123456","12345")
    Name<-c("Bob","Bob","Bob","Jack","Jack","Mary")
    df<-data.frame(Zip,Name)
    
    JM <- function(df){
      df %>%
      group_by(Zip, Name) %>%
      summarise(Count = n())
    }
    JM(df)
    #> `summarise()` has grouped output by 'Zip'. You can override using the `.groups` argument.
    #> # A tibble: 4 × 3
    #> # Groups:   Zip [3]
    #>   Zip    Name  Count
    #>   <chr>  <chr> <int>
    #> 1 123245 Bob       2
    #> 2 12345  Bob       1
    #> 3 12345  Mary      1
    #> 4 123456 Jack      2
    
    TarJae <- function(df){
      df %>% 
        count(Zip, Name, name= "Count")
    }
    
    TIC <- function(df){
      aggregate(cbind(Count = Zip) ~ Zip + Name, df, length)
    }
    TIC(df)
    #>      Zip Name Count
    #> 1 123245  Bob     2
    #> 2  12345  Bob     1
    #> 3 123456 Jack     2
    #> 4  12345 Mary     1
    
    res <- microbenchmark(JM(df), TIC(df), TarJae(df))
    autoplot(res)
    #> Coordinate system already present. Adding new coordinate system, which will replace the existing one.
    

    Created on 2021-12-22 by the reprex package (v2.0.1)