rgroup-bysubject

How to select only first non NA value of each group in R?


I have a data frame like

mydata <- data.frame(Id=c(01,01,01,01,01,01,02,02,02,02),
                     VISIT=c("Screeing","Baseline","Baseline","Baseline","Week 9","Week 9","Baseline","Week 2",
                             "Week 2","Week 2"),
                    Score=c(1,2,4,5,78,9,5,NA,3,4))

> mydata
   Id    VISIT Score
1   1 Screeing     1
2   1 Baseline     2
3   1 Baseline     4
4   1 Baseline     5
5   1   Week 9    78
6   1   Week 9     9
7   2 Baseline     5
8   2   Week 2     NA
9   2   Week 2     3
10  2   Week 2     4

What I am trying to do is to group by Id and VISIT and choose the first non NA value of each group as

> mydata

      Id VISIT    Score 
   <dbl> <fct>    <dbl> 
 1     1 Screeing     1     
 2     1 Baseline     2     
 5     1 Week 9      78    
 7     2 Baseline     5     
 9     2 Week 2       3     

This came to my mind

mydata<-mydata %>%
 group_by(Id,VISIT) %>% 
 mutate(first = dplyr::first(na.omit(Score)))

But it does not remove other rows, and it just create a new column with repeated values of first non NA of each group.


Solution

  • A dplyr alternative. Assuming that by "first" you simply mean the first row, in the order given, by group.

    Note that (Id, VISIT) in your example data gives 2 groups for Baseline.

    library(dplyr)
    
    mydata %>% 
      group_by(Id, VISIT) %>% 
      filter(!is.na(Score)) %>% 
      slice(1) %>% 
      ungroup()
    

    Result:

    # A tibble: 5 x 3
         Id VISIT    Score
      <dbl> <chr>    <dbl>
    1     1 Baseline     2
    2     1 Screeing     1
    3     1 Week 9      78
    4     2 Baseline     5
    5     2 Week 2       3