rdata-manipulationstargazerxtable

report datasets with different row names


I have different 2 datasets like df1 and df2, with only difference in the row name. How can I print it efficiently? Many thanks in advance.

df1 <- mtcars[1:6, 1:3]; rownames(df1)
df2 <- df1; rownames(df2) <- c("Mazda RX4","Mazda RX4 Wag","Datsun 710","Hornet 4 Drive",
                              "Hornet Sportabout","NEW.NAME"); rownames(df2)
df3 <- cbind(df1,df2); df3

Expected outcome,

                      mpg cyl disp  mpg cyl disp
Mazda RX4         21.0   6  160 21.0   6  160
Mazda RX4 Wag     21.0   6  160 21.0   6  160
Datsun 710        22.8   4  108 22.8   4  108
Hornet 4 Drive    21.4   6  258 21.4   6  258
Hornet Sportabout 18.7   8  360 18.7   8  360
Valiant           18.1   6  225 \\     \\  \\
New.NAME          \\     \\  \\ 18.1   6  225  

    

Solution

  • I’m not a massive fan of row names (I would even consider it bad practice or evil). There is an easy way to extract the row name information into a new column using data.table.

    In your case I'd go for:

    library(data.table)
    library(hablar)
    setDT(df1, keep.rownames = TRUE)
    setDT(df2, keep.rownames = TRUE)
    # Bind and keep unique rows
    df3 <- unique(rbind(df1, df2))
    df3
    #>                   rn  mpg cyl disp
    #> 1:         Mazda RX4 21.0   6  160
    #> 2:     Mazda RX4 Wag 21.0   6  160
    #> 3:        Datsun 710 22.8   4  108
    #> 4:    Hornet 4 Drive 21.4   6  258
    #> 5: Hornet Sportabout 18.7   8  360
    #> 6:           Valiant 18.1   6  225
    #> 7:          NEW.NAME 18.1   6  225
    

    If you want to keep the original sources identified, I'd do:

    # create df identify columns
    old <- setdiff(names(df1), "rn")
    new <- paste0(old, "_df1")
    setnames(df1, old, new)
    new <- paste0(old, "_df2")
    setnames(df2, old, new)
    # Different column names
    df3 <- unique(rbind(df1, df2, fill = TRUE))
    # sum_ from package hablar to keep NA
    df3 <-
        df3[, lapply(lapply(.SD, hablar::sum_), as.numeric), by = "rn"]
    df3
    #>                   rn mpg_df1 cyl_df1 disp_df1 mpg_df2 cyl_df2 disp_df2
    #> 1:         Mazda RX4    21.0       6      160    21.0       6      160
    #> 2:     Mazda RX4 Wag    21.0       6      160    21.0       6      160
    #> 3:        Datsun 710    22.8       4      108    22.8       4      108
    #> 4:    Hornet 4 Drive    21.4       6      258    21.4       6      258
    #> 5: Hornet Sportabout    18.7       8      360    18.7       8      360
    #> 6:           Valiant    18.1       6      225      NA      NA       NA
    #> 7:          NEW.NAME      NA      NA       NA    18.1       6      225