rgroup-bysummarizesimplifymutate

How to turn several rows of values into one row in R


I am simplifying and summarizing my data so that I can run an ANOVA on it so I want to compress the rows in one column into less rows. I feel like it only makes sense looking at the data so,

this is a simplified version of what my data looks like:


Participant ID Question_number Question_Rating_Type Avg_Participant_Rating
10556 1 1 2.303922
10556 1 2 2.507365
10556 2 1 2.303922
10556 2 2 2.507365
10556 3 1 2.303922
10556 3 2 2.507365
10556 4 1 2.303922
10556 4 2 2.507365

This is what I want it to look like:

Participant ID Question_range Question_Rating_Type Avg_Participant_Rating
10556 1 - 6 1 2.303922
10556 1 - 6 2 2.507365
10557 1 - 6 1 3.557155
10557 1 - 6 2 1.507365
10558 1 - 6 1 3.556154
10558 1 - 6 2 1.657311
10559 1 - 6 1 2.357325
10559 1 - 6 2 1.522221

So, I want to compress the Question_numbers so that the averages are shown as distinct values rather than being printed with each question_number, since they represent an average of the questions 1-6 of each participant's average question rating. I use tidyverse packages etc so if people have suggestions using those packages that would work!!

Edit: The responses have tried to average Avg_Participant_Rating but they are already averages! They are already matched up with their respective Participant Id, they are repeating in the data frame because they represent the average rating of questions 1-6 so I want that to be reflected by having the Question_number column to say 1-6 for each Participant Id.


Solution

  • You could use summarise(), grouping by participant ID and question rating type, if I understand what you want to do:

    library(tidyverse)  
    
    dat <- read.table(textConnection("ParticipantID Question_number Question_Rating_Type    Avg_Participant_Rating
    10556   1   1   2.303922
    10556   1   2   2.507365
    10556   2   1   2.303922
    10556   2   2   2.507365
    10556   3   1   2.303922
    10556   3   2   2.507365
    10556   4   1   2.303922
    10556   4   2   2.507365"), header=TRUE)
    
    dat %>% group_by(ParticipantID, Question_Rating_Type) %>% 
      summarise(Question_number = glue::glue("{min(Question_number)}-{max(Question_number)}"), 
                Avg_Participant_Rating = mean(Avg_Participant_Rating)) %>% 
      select(1,3,2,4)
    #> `summarise()` has grouped output by 'ParticipantID'. You can override using the
    #> `.groups` argument.
    #> # A tibble: 2 × 4
    #> # Groups:   ParticipantID [1]
    #>   ParticipantID Question_number Question_Rating_Type Avg_Participant_Rating
    #>           <int> <glue>                         <int>                  <dbl>
    #> 1         10556 1-4                                1                   2.30
    #> 2         10556 1-4                                2                   2.51
    

    Created on 2024-02-06 with reprex v2.0.2