Suppose I have a dataframe:
> df <- data.frame(
id = 1:10,
name = c("Bob", "Ashley", "James", "David", "Jenny",
"Hans", "Leo", "John", "Emily", "Lee"),
gender = c("Male", "Female", "Male", "Male", "Female",
"Male", "Male", "Male", "Female", "Female"))
I don't want the standard output for this:
> df
id name gender
1 1 Bob Male
2 2 Ashley Female
3 3 James Male
4 4 David Male
5 5 Jenny Female
6 6 Hans Male
7 7 Leo Male
8 8 John Male
9 9 Emily Female
10 10 Lee Female
Instead, I want to know which names go with female and which with male:
Female | Male |
---|---|
Ashley | Bob |
Jenny | James |
Emily | David |
Lee | Hans |
Leo | |
John |
There are plenty of functions that return the count of each (how many males, or how many James), but I've been unable to figure out how to just get the possible combinations.
Apparently you want this
> split(df$name, df$gender) |> lapply(`length<-`, max(table(df$gender))) |> as.data.frame()
Female Male
1 Ashley Bob
2 Jenny James
3 Emily David
4 Lee Hans
5 <NA> Leo
6 <NA> John
Based on your comment, here a small example with 10 groups instead of only 2:
> set.seed(42)
> df <- data.frame(name=LETTERS, group=sample.int(10, 26, replace=TRUE))
> split(df$name, df$group) |> lapply(`length<-`, max(table(df$group))) |> as.data.frame()
X1 X2 X3 X4 X5 X7 X8 X9 X10
1 A G R F B K J D E
2 C Q <NA> L N <NA> Z M H
3 I Y <NA> O V <NA> <NA> S P
4 <NA> <NA> <NA> U W <NA> <NA> T <NA>
5 <NA> <NA> <NA> X <NA> <NA> <NA> <NA> <NA>
Also possible:
> split(df, ~gender) |> lapply(`[`, 'name')
$Female
name
2 Ashley
5 Jenny
9 Emily
10 Lee
$Male
name
1 Bob
3 James
4 David
6 Hans
7 Leo
8 John
maybe you want consider
> split(df$name, df$gender)
$Female
[1] "Ashley" "Jenny" "Emily" "Lee"
$Male
[1] "Bob" "James" "David" "Hans" "Leo" "John"
or
> split(df$name, df$gender) |> lapply(sort)
$Female
[1] "Ashley" "Emily" "Jenny" "Lee"
$Male
[1] "Bob" "David" "Hans" "James" "John" "Leo"
Data:
> dput(df)
structure(list(id = 1:10, name = c("Bob", "Ashley", "James",
"David", "Jenny", "Hans", "Leo", "John", "Emily", "Lee"), gender = c("Male",
"Female", "Male", "Male", "Female", "Male", "Male", "Male", "Female",
"Female")), class = "data.frame", row.names = c(NA, -10L))