I have mark-recapture data in long form, and I want a column counting the number of times each individual has been seen at the time of each observation.
here is an example of the sort of data that I have:
dat<-tibble(ID=c("A","A","A","A","A","B","B","B","B","B"),
period=c("Aug.2012","Jun.2013","Aug.2013","Jun.2014","Aug.2014",
"Aug.2012","Jun.2013","Aug.2013","Jun.2014","Aug.2014"),
length=c(12,NA,NA,15,19, NA,3,6,10,NA))
dat$sample.event<-rep(1:5,dim(dat)[1]/5)
# A tibble: 10 × 4
ID period length sample.event
<chr> <chr> <dbl> <int>
1 A Aug.2012 12 1
2 A Jun.2013 NA 2
3 A Aug.2013 NA 3
4 A Jun.2014 15 4
5 A Aug.2014 19 5
6 B Aug.2012 NA 1
7 B Jun.2013 3 2
8 B Aug.2013 6 3
9 B Jun.2014 10 4
10 B Aug.2014 NA 5
so I want a new column called ind.obs
counting each time an individual has been seen, like this, but I want to keep the rows of the dataframe where the individual was not seen:
ID period length sample.event ind.obs
1 A Aug.2012 12 1 1
2 A Jun.2013 NA 2 NA
3 A Aug.2013 NA 3 NA
4 A Jun.2014 15 4 2
5 A Aug.2014 19 5 3
6 B Aug.2012 NA 1 NA
7 B Jun.2013 3 2 1
8 B Aug.2013 6 3 2
9 B Jun.2014 10 4 3
10 B Aug.2014 NA 5 NA
This seems like it should be possible using dplyr, but I can't figure it out.
I have tried:
dat%>%group_by(ID)%>%
drop_na(length) %>%
mutate(ind.obs=sequence(n()))
ID period length sample.event ind.obs
<chr> <chr> <dbl> <int> <int>
1 A Aug.2012 12 1 1
2 A Jun.2014 15 4 2
3 A Aug.2014 19 5 3
4 B Jun.2013 3 2 1
5 B Aug.2013 6 3 2
6 B Jun.2014 10 4 3
But as you can see, this completely removes the rows without observations.
I've also tried this, but get an error:
dat%>%group_by(ID) %>%mutate(ind.obs=sequence(n(na.rm=T)))
Error in `mutate()`:
ℹ In argument: `ind.obs = sequence(n(na.rm = T))`.
ℹ In group 1: `ID = "A"`.
Caused by error in `n()`:
! unused argument (na.rm = T)
Would appreciate any tips for resolving this, thanks
I bet there's something sharper, but:
dat %>%
mutate(ind.obs = if_else(is.na(length), length,
cumsum(!is.na(length))), .by=ID)
# Thanks @TarJae for the helpful improvement
Result
# A tibble: 10 × 5
ID period length sample.event ind.obs
<chr> <chr> <dbl> <int> <dbl>
1 A Aug.2012 12 1 1
2 A Jun.2013 NA 2 NA
3 A Aug.2013 NA 3 NA
4 A Jun.2014 15 4 2
5 A Aug.2014 19 5 3
6 B Aug.2012 NA 1 NA
7 B Jun.2013 3 2 1
8 B Aug.2013 6 3 2
9 B Jun.2014 10 4 3
10 B Aug.2014 NA 5 NA