rdatesubsetlubridatechron

R subsetting a big dataframe based on date values


data frame with following strucutre : 'data.frame': 4371 obs. of 6 variables:

 "$ tg     : num  0.0403 0.0404 0.0404 0.0404 0.0405 ...
 $ date   : Factor w/ 4371 levels "2/20/2020 10:00",..: 841 842 843 844 845 846 847 848 849 850 ...
 $ lgp_bar: int  497 497 497 497 497 497 497 497 494 494 ...
 $ lgt    : num  87.8 87.8 87.8 87.8 87.8 ...
 $ ugp_bar: int  451 451 451 451 451 451 451 450 447 447 ...
 $ ugt    : num  71.9 71.9 71.9 71.9 71.9 ..."

I have to subset this data frame between some dates for ex. from 2/24/2020 17:00 to 2/26/2020 02:00 being a novice in using dtaes datatype. i am unable to do this simple task . I have tried following code withut any success. intutively it will take me 2 mins to do this task in excel spreadsheet.

humm<-read.csv("book1.csv", header = TRUE); 
humm$datenumber<-as_datetime(humm$date)
dts<-as.character(cbind("02/22/2020 02:00","02/23/20 10:00"))

hummfilter <- subset(humm, humm$date >= dts[1]) # || date <= dts[2])

hummfilter<-as.data.frame(humm[humm$date>=dts[1]|humm$date<=dts[2],],na.rm=TRUE)

Solution

  • You can convert date column to POSIXct and then subset.

    You can do this using base R :

    humm$date <- as.POSIXct(humm$date, format = '%m/%d/%Y %H:%M')
    subset(humm, date >= as.POSIXct('02/24/2020 17:00', format = '%m/%d/%Y %H:%M', tz = 'GMT') & 
                 date <= as.POSIXct('02/26/2020 02:00', format = '%m/%d/%Y %H:%M', tz = 'GMT'))
    

    Or dplyr and lubridate :

    library(dplyr)
    library(lubridate)
    
    humm %>%
      mutate(date = mdy_hm(date)) %>%
      filter(between(date, mdy_hm('02/24/2020 07:00'), mdy_hm('02/26/2020 02:00')))