I have a dataset that looks something like this:
name | party | count | year | likes | retweet |
---|---|---|---|---|---|
Tom | R | 1 | 2016 | 1357 | 23 |
Dave | R | 1 | 2016 | 1881 | 34 |
Larry | D | 1 | 2016 | 324 | 45 |
Tim | D | 1 | 2016 | 5587 | 56 |
Rob | R | 1 | 2016 | 9847 | 67 |
Sam | D | 1 | 2016 | 4466 | 78 |
Tom | R | 1 | 2017 | ### | ## |
Dave | R | 1 | 2017 | ### | ## |
Larry | D | 1 | 2017 | ### | ## |
Tim | D | 1 | 2017 | ### | ## |
Rob | R | 1 | 2017 | ### | ## |
Sam | D | 1 | 2017 | ### | ## |
Tom | R | 1 | 2018 | ### | ## |
Dave | R | 1 | 2018 | ### | ## |
Larry | D | 1 | 2018 | ### | ## |
Tim | D | 1 | 2018 | ### | ## |
Rob | R | 1 | 2018 | ### | ## |
Sam | D | 1 | 2018 | ### | ## |
I have several columns that I want to collapse and sum, so that I can have a simpler output of a long dataset. I you'll notice that there are a bunch of "###" symbols, but that was just because I didn't think it was necessary to add additional information, as I believe what I provided is enough to solve my problem.
But I want to reformat the data, so it's easier to create a plot. Basically, I want the variables below. So, I want the first column to be the year (one row for each year in the dataset). The second column should be the sum of rows that were collapsed to create that row (I created the "count" variable in Table 1, so that I could sum the row value.) I then want four new columns, each representing the sum total of likes and retweets for Democrats and Republicans for that year. It should look something like this:
year | count | likes_republicans | likes_democrats | retweets_republicans | retweets_democrats |
---|---|---|---|---|---|
2016 | 6 | 13085 | 10377 | 124 | 179 |
2017 | 6 | ### | ### | ### | ### |
2018 | 6 | ### | ### | ### | ### |
I feel like there is a simple command in dplyr to do this type of problem, but I don't remember what it is and I feel like I really struggle to understand it and how it works. Any help would be greatly appreciated.
Sticking to data.table
.
> library(data.table)
> dt <- melt(dt, c(1, 2, 4), c(3, 5:6)) |>
+ dcast(year ~ party + variable, fun=sum)
> dt[, count := D_count + R_count]
> dt
Key: <year>
year D_count D_likes D_retweet R_count R_likes R_retweet count
<int> <int> <int> <int> <int> <int> <int> <int>
1: 2016 3 10377 6859 3 13085 4733 6
2: 2017 3 3766 5346 3 5944 7332 6
3: 2018 3 7426 2255 3 56041 6109 6
Data:
dt <- data.table::fread('
name party count year likes retweet
Tom R 1 2016 1357 23
Dave R 1 2016 1881 34
Larry D 1 2016 324 45
Tim D 1 2016 5587 56
Rob R 1 2016 9847 4676
Sam D 1 2016 4466 6758
Tom R 1 2017 3456 2543
Dave R 1 2017 2345 3546
Larry D 1 2017 675 1423
Tim D 1 2017 2345 3567
Rob R 1 2017 143 1243
Sam D 1 2017 746 356
Tom R 1 2018 134 1423
Dave R 1 2018 54673 563
Larry D 1 2018 5467 1243
Tim D 1 2018 1423 365
Rob R 1 2018 1234 4123
Sam D 1 2018 536 647
')