I have two dataframe : df1
and df2
. df2
is composed of rows from df1
. For each row of df2
, I want its index in df1
.
For example :
df1 <- data.frame(animal=c('koala', 'hedgehog', 'sloth', 'panda'),
country=c('Australia', 'Italy', 'Peru', 'China'),
avg_sleep_hours=c(21, 18, 17, 10))
df2 <- data.frame(animal=c('koala', 'sloth', 'panda', 'panda'),
country=c('Australia', 'Peru', 'China', 'China'),
avg_sleep_hours=c(21,17,10,10))
I want to get
1 3 4 4
I searched on Internet but found no satisfactory answer, so I wrote my own code.
I know that findIdxRow
could return several numbers if a row of df2
is duplicates in df1
but it won't appear in my data, so I didn't take the time to deal with that.
findIdxRow <- function(row, df)
{
n <- nrow(df)
is_equal <- sapply(1:n, function(i) all(row==df[i,]))
return(which(is_equal))
}
indexes <- sapply(1:nrow(df2), function(i) findIdxRow(df2[i,],df1))
This code works, but I wonder if there is a shorter way to write it.
You can use match
on top of paste
like below
match(do.call(paste, df2), do.call(paste, df1))
set.seed(42)
n <- 5E3
library(dplyr)
df1 <- data.frame(
animal = ids::adjective_animal(n),
country = ids::proquint(n, n_words = 1))
df2 <- df1 |>
slice_sample(n = n)
f1 <- \() {
a <- merge(
df2 |> transform(index_orig = 1:nrow(df2)),
df1 |> transform(index = 1:nrow(df1))
)
a[order(a$index_orig), ]$index
}
f2 <- \() {
df2 |>
left_join(df1 |> mutate(index = row_number()), by = join_by(animal, country)) |>
pull(index)
}
f3 <- \() {
match(do.call(paste, df2), do.call(paste, df1))
}
microbenchmark(
f1(),
f2(),
f3(),
unit = "relative",
check = "equal"
)
which shows
Unit: relative
expr min lq mean median uq max neval
f1() 21.351795 16.785351 15.511564 15.514184 14.906641 9.398281 100
f2() 3.882332 3.063971 2.929195 2.893312 2.848455 2.389725 100
f3() 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 100