rdplyrsequential

Sequential count of values within factor level, ignoring NAs


I have mark-recapture data in long form, and I want a column counting the number of times each individual has been seen at the time of each observation.

here is an example of the sort of data that I have:

dat<-tibble(ID=c("A","A","A","A","A","B","B","B","B","B"),
           period=c("Aug.2012","Jun.2013","Aug.2013","Jun.2014","Aug.2014",
                    "Aug.2012","Jun.2013","Aug.2013","Jun.2014","Aug.2014"),
           length=c(12,NA,NA,15,19, NA,3,6,10,NA))
dat$sample.event<-rep(1:5,dim(dat)[1]/5)

# A tibble: 10 × 4
   ID    period   length sample.event
   <chr> <chr>     <dbl>        <int>
 1 A     Aug.2012     12            1
 2 A     Jun.2013     NA            2
 3 A     Aug.2013     NA            3
 4 A     Jun.2014     15            4
 5 A     Aug.2014     19            5
 6 B     Aug.2012     NA            1
 7 B     Jun.2013      3            2
 8 B     Aug.2013      6            3
 9 B     Jun.2014     10            4
10 B     Aug.2014     NA            5

so I want a new column called ind.obs counting each time an individual has been seen, like this, but I want to keep the rows of the dataframe where the individual was not seen:

   ID   period length sample.event ind.obs
1   A Aug.2012     12            1       1
2   A Jun.2013     NA            2      NA
3   A Aug.2013     NA            3      NA
4   A Jun.2014     15            4       2
5   A Aug.2014     19            5       3
6   B Aug.2012     NA            1      NA
7   B Jun.2013      3            2       1
8   B Aug.2013      6            3       2
9   B Jun.2014     10            4       3
10  B Aug.2014     NA            5      NA

This seems like it should be possible using dplyr, but I can't figure it out.

I have tried:

dat%>%group_by(ID)%>%
  drop_na(length) %>%
  mutate(ind.obs=sequence(n()))

  ID    period   length sample.event ind.obs
  <chr> <chr>     <dbl>        <int>   <int>
1 A     Aug.2012     12            1       1
2 A     Jun.2014     15            4       2
3 A     Aug.2014     19            5       3
4 B     Jun.2013      3            2       1
5 B     Aug.2013      6            3       2
6 B     Jun.2014     10            4       3

But as you can see, this completely removes the rows without observations.

I've also tried this, but get an error:

dat%>%group_by(ID) %>%mutate(ind.obs=sequence(n(na.rm=T)))

Error in `mutate()`:
ℹ In argument: `ind.obs = sequence(n(na.rm = T))`.
ℹ In group 1: `ID = "A"`.
Caused by error in `n()`:
! unused argument (na.rm = T)

Would appreciate any tips for resolving this, thanks


Solution

  • I bet there's something sharper, but:

    dat %>%   
      mutate(ind.obs = if_else(is.na(length), length, 
                               cumsum(!is.na(length))), .by=ID)
    # Thanks @TarJae for the helpful improvement
    

    Result

    # A tibble: 10 × 5
       ID    period   length sample.event ind.obs
       <chr> <chr>     <dbl>        <int>   <dbl>
     1 A     Aug.2012     12            1       1
     2 A     Jun.2013     NA            2      NA
     3 A     Aug.2013     NA            3      NA
     4 A     Jun.2014     15            4       2
     5 A     Aug.2014     19            5       3
     6 B     Aug.2012     NA            1      NA
     7 B     Jun.2013      3            2       1
     8 B     Aug.2013      6            3       2
     9 B     Jun.2014     10            4       3
    10 B     Aug.2014     NA            5      NA