rdplyr

Use of which() inside of mutate() in dplyr


I have been using which() to define a new variable inside mutate in the following way.

I have two data frames:

df1 <- data.frame(
telephone = c("1231231234", "2342342345", "3453453456", "1231231234"),
email = c("a@email.com", "b@email.com", "c@email.com", "d@mail.com")
)

df2 <- data.frame(
phone = c("1231231234", "2342342345", "3453453456")
)

What I am attempting to do is add a new variable to df2 that stores each row number in which the value of "phone" occurs in df1$telephone. I have attempted the following:

df2 <- df2 %>%
mutate(
phone_ind = str_flatten(which(df1$telephone == phone), collapse = ", ")
)

This yields the warning

Warning message: Problem while computing phone_ind = str_flatten(which(df1$telephone == phone), collapse = ", ").

ℹ longer object length is not a multiple of shorter object length

as well as the following obviously incorrect results:

df2 <- data.frame(
phone = c("1231231234", "2342342345", "3453453456"),
phone_ind = c("1, 2, 3, 4", "1, 2, 3, 4", "1, 2, 3, 4")
)

Any ideas? Thank you in advance for any insight.


Solution

  • library(dplyr) # v1.1.0
    df2 %>%
      left_join(transmute(df1, phone = telephone, row = row_number()), multiple = "all") %>%
      summarize(row = paste(row, collapse = ", "), .by = phone)
    

    Result

    Joining with `by = join_by(phone)`
           phone  row
    1 1231231234 1, 4
    2 2342342345    2
    3 3453453456    3