I have a large dataset on behaviour. One of my columns is called "session" and indicates in which session certain events took place. another column is called "id" and indicates the individual that the event belongs to. so, e.g. there are multiple entries with session = 230 and, one of them belonging to individual A, another to individual B. now there is another important column called fas. fas has 1s or 0s depending on whether a certain event took place.
i know how to drop all rows from the database with fas == 1 but i want to drop all sessions that had an fas even if in that column it says fas == 0. Example: in session 230 i saw two individuals. A and B. for A there was an fas, so fas == 1. But for B in the same session, there was no fas, so fas == 0. Now i want to drop both columns since within session 230 there was an fas which means i cannot use any of that data. How do I do this?
For the example below, I would want to drop sessions 230 (row 1 and 2) and 231 (row 3) with a single line of code ideally.
I tried the ifelse function to see whether i can create another column which would mark the all rows with the session with 1 or 0 and then i could drop conditioning on the value in that column, but I could not make it work. Something like
ifelse(data$session == data[data$fas == 1,]$session, add_column(drop = 1))
It does not need to be ifelse, if a better option exists
reproducible example:
session <- c("230", "230", "231", "232", "232")
id <- c("A","B","C","D", "E")
fas <- c(1,0,1,0,0)
df <- data.frame(session, id, fas)
session | id | fas |
---|---|---|
230 | A | 1 |
230 | B | 0 |
231 | C | 1 |
232 | D | 0 |
232 | E | 0 |
and it could result in something like for which i could then drop all rows if drop == 1
session | id | fas | drop |
---|---|---|---|
230 | A | 1 | 1 |
230 | B | 0 | 1 |
231 | C | 1 | 1 |
232 | D | 0 | 0 |
232 | E | 0 | 0 |
Use %in%
to check against the set of sessions where fas == 1
:
df <- data.frame(session, id, fas)
transform(df, drop = session %in% session[fas == 1])
#> session id fas drop
#> 1 230 A 1 TRUE
#> 2 230 B 0 TRUE
#> 3 231 C 1 TRUE
#> 4 232 D 0 FALSE
#> 5 232 E 0 FALSE