rdataframedplyr

Join merge two data frames based in string names and adding the mean of other columns


I need to merge two data frames by the species columns (present in both data frames) and add the mean values from two other two columns of the data frame 2 (df2).

Example:

df1 <- as.data.frame(rbind("A","B","C"))
colnames(df1) <- "species"

df1 <-
species
  A
  B
  C

a <- rbind(replicate(2,sample(1:5,5,rep=TRUE)))
b <- rbind(replicate(2,sample(2:8,10,rep=TRUE)))
c <- rbind(replicate(2,sample(4:11,6,rep=TRUE)))

a1 <- cbind(replicate(5, "A"),a)
b1 <- cbind(replicate(10, "B"),b)
c1 <- cbind(replicate(6, "C"),c)
            
df2 <- as.data.frame(rbind(a1,b1,c1))
colnames(df2) <- c("species", "variable1", "variable2")
df2[, c(2:3)] <- sapply(df2[, c(2:3)], as.numeric)

df2 <-
    species variable1 variable2
       A         4         3
       A         3         4
       A         5         4
       A         3         3
       A         1         4
       B         6         7
       B         8         5
       B         5         8
       B         3         5
       B         2         2
       B         5         7
       B         5         7
       B         4         3
       B         4         2
       B         6         7
       C        11         4
       C        10         9
       C         6         5
       C         6        10
       C         8        11
       C         6         4

The outcome I'm looking for is the following:

result_df <- 
     mean_variable1   mean_variable2
  A        X               X
  B        X               X
  C        X               X

I need the mean of the two variables for each species of the df1 (A, B and C).

Any idea on how to do this maybe using dplyr?


Solution

  • I think the below would work? If you need to join on multiple character columns that exist in both, can do by = c("var1", "var2", ...)

      result_df <-  df2 %>%  group_by(species) %>% 
      summarize(mean_variable1 = mean(variable1)
                ,mean_variable2 = mean(variable2)) %>% 
      inner_join(df1, by = c("species"))