I have a dataframe of transactions (roughly 76K rows). Each row has a column called START_DATE which is when the transaction started.
I am trying to filter down to transactions with START_DATE > 1/1/2023.
I am doing this:
str(A_23$START_DATE) ##IT STARTS OUT AS A CHR
A_23$START_DATE <- as.Date(A_23$START_DATE, format = "%Y-%m-%d")
str(A_23$START_DATE) ##THIS CONFIRMS IT IS A DATE
T_Date <- as.Date("2023-01-01", format = "%Y-%m-%d") ##USING A PLACEHOLDER VARIABLE TO VALIDATE STR()
str(T_Date) ##THIS CONFIRMS IT IS A DATE
A_23 <- A_23[A_23$START_DATE >= T_Date,] ##THIS CUTS THE DATAFRAME DOWN FROM 76K ROWS to 9K ROWS (IT WORKS)
head(A_23) ##EVERYTHING IS NA, THE ENTIRE FRAME
What am I doing wrong? Why is this causing all of my data to get erased but R knows how many rows?
Here is a possible solution:
library(dplyr)
library(lubridate)
df <- data.frame(
start_date = c("1/2/2021","5/11/2020","1/2/2021","5/11/2020"),
item = c("A","B","C","D")
)
df <- df |>
mutate(start_date = as_date(dmy(start_date)))
# filter date
date_filter <- df |>
filter(!start_date %in% ymd("2021-02-01"))
date_filter
start_date item
1 2020-11-05 B
2 2020-11-05 D
Hope this helps!