rdcast

How do I transpose observations into variables/columns?


I've got a data frame that looks something like this:

precinct, race, age, people
1001, black, 18-40, 1
1001, white, 18-40, 2
1001, hispanic, 18-40, 3
1001, asian, 18-40, 4
1001, black, 40 or older, 5
1001, white, 40 or older, 6
1001, hispanic, 40 or older, 7
1001, asian, 40 or older, 8

I want to make it look like this:

precinct, black, white, hispanic, asian, 18-40, 40 or older
1001, 6, 8, 10, 12, 10, 26

I've used dcast

dcast(
  data = mydataframe,
  formula = Precinct ~ race + age,
  fun.aggregate = sum,
  value.var = 'people'
)

but this does not produce my desired result.


Solution

  • When we create formula with + on the rhs of ~ it creates the combinations between those columns instead of having every single unique element from those columns. In order to have the latter, we may need to melt to long format and then use dcast on the single column (assuming those columns are of the same type)

    library(data.table)
    dcast(melt(setDT(mydataframe), id.var = c('precinct', 'people')),
        precinct ~ value, fun.aggregate = sum, value.var = 'people')
    

    -output

    Key: <precinct>
       precinct 18-40 40 or older asian black hispanic white
          <int> <int>       <int> <int> <int>    <int> <int>
    1:     1001    10          26    12     6       10     8
    

    library(dplyr)
    library(tidyr)
    mydataframe %>% 
      pivot_longer(cols = c(race, age), names_to = NULL) %>% 
      pivot_wider(names_from = value, values_from = people, values_fn = sum)
    

    -output

    # A tibble: 1 × 7
      precinct black `18-40` white hispanic asian `40 or older`
         <int> <int>   <int> <int>    <int> <int>         <int>
    1     1001     6      10     8       10    12            26
    

    data

    mydataframe <- structure(list(precinct = c(1001L, 1001L, 1001L, 1001L, 1001L, 
    1001L, 1001L, 1001L), race = c("black", "white", "hispanic", 
    "asian", "black", "white", "hispanic", "asian"), age = c("18-40", 
    "18-40", "18-40", "18-40", "40 or older", "40 or older", "40 or older", 
    "40 or older"), people = 1:8), row.names = c(NA, -8L), 
    class = "data.frame")