pythonrsortingdataframefindandmodify

Python/R : If 2 columns have same value in multiple rows, add the values in the 3rd column and average the 4th, 5th and 6th column


Input :

0 77 1 2 3 5
0 78 2 4 6 1
0 78 1 2 3 5
3 79 0 4 5 2
3 79 6 8 2 1
3 79 1 2 3 1

Output : (add the 3rd column values for the identical rows and take mean of all the values in the 4th, 5th and the 6th column)

0 77 1.0 2.0 3.0 5.0
0 78 3.0 3.0 4.5 3.0
3 79 7.0 4.6 3.3 1.3

Solution

  • We can use dplyr in R. We group by the first two columns, mutate the 3rd column ('V3') as sum of that column, and use summarise_each to get the mean of columns 3:6.

    library(dplyr)
    res <- df1 %>%
             group_by(V1, V2) %>% 
             mutate(V3=sum(V3))  %>% 
             summarise_each(funs(round(mean(.),1)), V3:V6)
    as.data.frame(res)
    #  V1 V2 V3  V4  V5  V6
    #1  0 77  1 2.0 3.0 5.0
    #2  0 78  3 3.0 4.5 3.0
    #3  3 79  7 4.7 3.3 1.3
    

    data

    df1 <- structure(list(V1 = c(0L, 0L, 0L, 3L, 3L, 3L), V2 = c(77L, 78L, 
    78L, 79L, 79L, 79L), V3 = c(1L, 2L, 1L, 0L, 6L, 1L), V4 = c(2L, 
    4L, 2L, 4L, 8L, 2L), V5 = c(3L, 6L, 3L, 5L, 2L, 3L), V6 = c(5L, 
    1L, 5L, 2L, 1L, 1L)), .Names = c("V1", "V2", "V3", "V4", "V5", 
    "V6"), class = "data.frame", row.names = c(NA, -6L))