I have a data frame with 35 columns and about 250,000 rows. Depending on the values in the year
column and the network_id
column I want to remove some rows. The specification of which to remove is given in this list:
remove.nets <- list(r19=c(14, 31),
r21=c(31),
r23=c(32),
r24=c(1, 4, 8, 24, 30, 59))
So if the year is 2019 and the network ID is either 14 or 31, remove the row, and similarly for other rows. I tried something like this:
test.data2 <- test.data %>%
{if (year==2019) filter(., !network_id %in% remove.nets$r19)}
This seemed to me to be an obvious way to do this but it didn't work. (It threw errors that I don't understand).
Error in year == 2019 :
comparison (==) is possible only for atomic and list types
I had to make a data frame out of the remove.nets
list and do an anti_join
like this:
remove.nets <- data.frame(year=c(2019, 2019, 2021, 2023, rep(2024, 6)),
network_id=c(14, 31, 31, 32, 1, 4, 8, 24, 30, 59))
anti_join(., remove.nets, by=c("year", "network_id"))
This works but it's aesthetically un-pleasing. Can anyone help me make it easier and prettier?
There's nothing aesthetically unpleasing about anti_join
. To get the data frame from the list, just do:
remove.nets.df <- data.frame(year=rep(sub('r', 20, names(remove.nets)),
sapply(remove.nets, length)),
network_id=unlist(remove.nets))
And then:
library(dplyr)
anti_join(test.data, remove.nets.df, by=c("year", "network_id"))