rdataframedplyrtidyversewrite.table

Create new column based on whether date and time at which a species occurs is between other dates, over multiple time periods


I have a dataframe (df1) with multiple rows species, and the date-time event at which that species occurs that looks like this:

df1 <- as.data.frame(sample(seq(from=as.POSIXct("2023-07-01 00:00"), 
            to=as.POSIXct("2023-07-01 00:20"), by="sec"), 21))
df1
colnames(df1) <- c('day.hour.df1') #rename column of df2
df1$Species <- c("a", "b", "b", "a", "c", "NA", "a", "a", "c", "b", "b", 
                 "c", "c", "NA", "a", "a", "b", "b", "a", "NA", "b") 
                 # add species column
names(df1)

And I have a second dataframe (df2) that, as columns, contains a start (as day and time) (START.df2) an end (END.df2), and species (here "a"). the second dataframe looks like follows:

df2 <- as.data.frame(seq(from=as.POSIXct("2023-07-01 00:00:00"), 
                     to=as.POSIXct("2023-07-01 00:20:00"), by="min"))
df2
df2$time2 <- (seq(from=as.POSIXct("2023-07-01 00:00:59"), 
              to=as.POSIXct("2023-07-01 00:20:59"), by="min"))
names(df1)
df2$species <- (c('a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 
                  'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'))
df2

colnames(df1) <- c('START.df2', 'END.df2', 'Species')

I want to add to the second dataframe (df2) a new column called "occurrence_a", where, if the species "a" contained in df1 occurs at a day_time (day.hour.df1) in between START (START.df2) and END (END.df2), of dataframe 2 (df2), it gives a 1, otherwise, it gives 0.

I tried this but without success:

library(dplyr)

df2 %>% left_join(df1, by = "Species") %>% 
  mutate( = between(day.hour.df2, START.df1, END.df1)) %>% 
  group_by(species, day.hour.df1) %>% 
  summarise(OCCURRENCEa = any(OCCURRENCEa))
df2

Solution

  • I'm not entirely sure what output you want, but I think this should be close at least:

    left_join(df2, df1, join_by(Species, between(y$day.hour.df1, x$START.df2, x$END.df2))) %>% 
      group_by(Species, START.df2, END.df2) %>% 
      summarise(
        OCCURRENCE = any(!is.na(day.hour.df1)) %>% as.numeric(), 
        .groups = 'drop'
      ) %>% 
      pivot_wider(names_from = Species, values_from = OCCURRENCE, names_prefix = 'OCCURRENCE')
    

    You don't really need the pivotting and such for this example, but I am assuming you have multiple species in your real data and want different columns for those.