I have different 2 datasets like df1 and df2, with only difference in the row name. How can I print it efficiently? Many thanks in advance.
df1 <- mtcars[1:6, 1:3]; rownames(df1)
df2 <- df1; rownames(df2) <- c("Mazda RX4","Mazda RX4 Wag","Datsun 710","Hornet 4 Drive",
"Hornet Sportabout","NEW.NAME"); rownames(df2)
df3 <- cbind(df1,df2); df3
Expected outcome,
mpg cyl disp mpg cyl disp
Mazda RX4 21.0 6 160 21.0 6 160
Mazda RX4 Wag 21.0 6 160 21.0 6 160
Datsun 710 22.8 4 108 22.8 4 108
Hornet 4 Drive 21.4 6 258 21.4 6 258
Hornet Sportabout 18.7 8 360 18.7 8 360
Valiant 18.1 6 225 \\ \\ \\
New.NAME \\ \\ \\ 18.1 6 225
I’m not a massive fan of row names (I would even consider it bad practice or evil).
There is an easy way to extract the row name information into a new column using data.table
.
In your case I'd go for:
library(data.table)
library(hablar)
setDT(df1, keep.rownames = TRUE)
setDT(df2, keep.rownames = TRUE)
# Bind and keep unique rows
df3 <- unique(rbind(df1, df2))
df3
#> rn mpg cyl disp
#> 1: Mazda RX4 21.0 6 160
#> 2: Mazda RX4 Wag 21.0 6 160
#> 3: Datsun 710 22.8 4 108
#> 4: Hornet 4 Drive 21.4 6 258
#> 5: Hornet Sportabout 18.7 8 360
#> 6: Valiant 18.1 6 225
#> 7: NEW.NAME 18.1 6 225
If you want to keep the original sources identified, I'd do:
# create df identify columns
old <- setdiff(names(df1), "rn")
new <- paste0(old, "_df1")
setnames(df1, old, new)
new <- paste0(old, "_df2")
setnames(df2, old, new)
# Different column names
df3 <- unique(rbind(df1, df2, fill = TRUE))
# sum_ from package hablar to keep NA
df3 <-
df3[, lapply(lapply(.SD, hablar::sum_), as.numeric), by = "rn"]
df3
#> rn mpg_df1 cyl_df1 disp_df1 mpg_df2 cyl_df2 disp_df2
#> 1: Mazda RX4 21.0 6 160 21.0 6 160
#> 2: Mazda RX4 Wag 21.0 6 160 21.0 6 160
#> 3: Datsun 710 22.8 4 108 22.8 4 108
#> 4: Hornet 4 Drive 21.4 6 258 21.4 6 258
#> 5: Hornet Sportabout 18.7 8 360 18.7 8 360
#> 6: Valiant 18.1 6 225 NA NA NA
#> 7: NEW.NAME NA NA NA 18.1 6 225