I am simplifying and summarizing my data so that I can run an ANOVA on it so I want to compress the rows in one column into less rows. I feel like it only makes sense looking at the data so,
this is a simplified version of what my data looks like:
Participant ID | Question_number | Question_Rating_Type | Avg_Participant_Rating |
---|---|---|---|
10556 | 1 | 1 | 2.303922 |
10556 | 1 | 2 | 2.507365 |
10556 | 2 | 1 | 2.303922 |
10556 | 2 | 2 | 2.507365 |
10556 | 3 | 1 | 2.303922 |
10556 | 3 | 2 | 2.507365 |
10556 | 4 | 1 | 2.303922 |
10556 | 4 | 2 | 2.507365 |
This is what I want it to look like:
Participant ID | Question_range | Question_Rating_Type | Avg_Participant_Rating |
---|---|---|---|
10556 | 1 - 6 | 1 | 2.303922 |
10556 | 1 - 6 | 2 | 2.507365 |
10557 | 1 - 6 | 1 | 3.557155 |
10557 | 1 - 6 | 2 | 1.507365 |
10558 | 1 - 6 | 1 | 3.556154 |
10558 | 1 - 6 | 2 | 1.657311 |
10559 | 1 - 6 | 1 | 2.357325 |
10559 | 1 - 6 | 2 | 1.522221 |
So, I want to compress the Question_numbers so that the averages are shown as distinct values rather than being printed with each question_number, since they represent an average of the questions 1-6 of each participant's average question rating. I use tidyverse packages etc so if people have suggestions using those packages that would work!!
Edit: The responses have tried to average Avg_Participant_Rating but they are already averages! They are already matched up with their respective Participant Id, they are repeating in the data frame because they represent the average rating of questions 1-6 so I want that to be reflected by having the Question_number column to say 1-6 for each Participant Id.
You could use summarise()
, grouping by participant ID and question rating type, if I understand what you want to do:
library(tidyverse)
dat <- read.table(textConnection("ParticipantID Question_number Question_Rating_Type Avg_Participant_Rating
10556 1 1 2.303922
10556 1 2 2.507365
10556 2 1 2.303922
10556 2 2 2.507365
10556 3 1 2.303922
10556 3 2 2.507365
10556 4 1 2.303922
10556 4 2 2.507365"), header=TRUE)
dat %>% group_by(ParticipantID, Question_Rating_Type) %>%
summarise(Question_number = glue::glue("{min(Question_number)}-{max(Question_number)}"),
Avg_Participant_Rating = mean(Avg_Participant_Rating)) %>%
select(1,3,2,4)
#> `summarise()` has grouped output by 'ParticipantID'. You can override using the
#> `.groups` argument.
#> # A tibble: 2 × 4
#> # Groups: ParticipantID [1]
#> ParticipantID Question_number Question_Rating_Type Avg_Participant_Rating
#> <int> <glue> <int> <dbl>
#> 1 10556 1-4 1 2.30
#> 2 10556 1-4 2 2.51
Created on 2024-02-06 with reprex v2.0.2