I have a data frame with 1000s of rows with a few columns where I would like to find the unique rows and mark if each column has a unique value or not.
for example, I have a data frame that looks like the following.
data frame: structure(list(X = c("Row1", "Row2", "Row3", "Row4", "Row5", "Row6", "Row7"), Col1 = c(0L, 1L, 1L, 0L, 1L, -1L, 0L), Col2 = c(1L, 0L, 1L, 1L, 1L, 0L, -2L), Col3 = c(-1L, 0L, 1L, 0L, 0L, 1L, 1L )), class = "data.frame", row.names = c(NA, -7L))
Col1 Col2 Col3
Row1 0 1 -1
Row2 1 0 0
Row3 1 1 1
Row4 0 1 0
Row5 1 1 0
Row6 -1 0 1
Row7 0 -2 1
and I would like to get the following data frame.
Col1 Col2 Col3 Col1_uni Col2_uni Col3_uni
Row1 0 1 -1 Col1 Col2 Col3
Row2 1 0 0 Col1 NA NA
Row3 1 1 1 NA NA NA
Row4 0 1 0 NA Col2 NA
Row5 1 1 0 NA NA NA
Row6 -1 0 1 Col1 Col2 Col3
Row7 0 -2 1 Col1 Col2 Col3
I have tried df=df[row.names(unique(df[,c("clo_2")])), c("clo_2")]
but that doesnt really helps.I hope somebody has a easy solution for this. thanks in advance.
We can use dplyr with rowwise()
:
library(dplyr)
df %>% rowwise %>%
mutate(across(starts_with('Col'), ~ sum(c_across(starts_with('Col')) == .x), .names = "{.col}_uni"),
across(ends_with('uni'), ~ifelse(.x==1, str_remove(deparse(substitute(.x)), '_uni$'), NA)))
# A tibble: 7 × 7
# Rowwise:
X Col1 Col2 Col3 Col1_uni Col2_uni Col3_uni
<chr> <int> <int> <int> <chr> <chr> <chr>
1 Row1 0 1 -1 Col1 Col2 Col3
2 Row2 1 0 0 Col1 NA NA
3 Row3 1 1 1 NA NA NA
4 Row4 0 1 0 NA Col2 NA
5 Row5 1 1 0 NA NA Col3
6 Row6 -1 0 1 Col1 Col2 Col3
7 Row7 0 -2 1 Col1 Col2 Col3
I would also recommend the simpler "col_names-less" approach suggested by @caldwellst, for which the simplified code in my answer would be:
library(dplyr)
df %>% rowwise %>%
mutate(across(starts_with('Col'),
~sum(c_across(starts_with('Col')) == .x) == 1,
.names = "{.col}_uni"))
# A tibble: 7 × 7
# Rowwise:
X Col1 Col2 Col3 Col1_uni Col2_uni Col3_uni
<chr> <int> <int> <int> <lgl> <lgl> <lgl>
1 Row1 0 1 -1 TRUE TRUE TRUE
2 Row2 1 0 0 TRUE FALSE FALSE
3 Row3 1 1 1 FALSE FALSE FALSE
4 Row4 0 1 0 FALSE TRUE FALSE
5 Row5 1 1 0 FALSE FALSE TRUE
6 Row6 -1 0 1 TRUE TRUE TRUE
7 Row7 0 -2 1 TRUE TRUE TRUE