rdataframe

How to generate a table with a list of possible entries, rather than the count?


Suppose I have a dataframe:

> df <- data.frame(
    id = 1:10,
    name = c("Bob", "Ashley", "James", "David", "Jenny",
      "Hans", "Leo", "John", "Emily", "Lee"),
    gender = c("Male", "Female", "Male", "Male", "Female", 
      "Male", "Male", "Male", "Female", "Female"))

I don't want the standard output for this:

> df
   id   name gender
1   1    Bob   Male
2   2 Ashley Female
3   3  James   Male
4   4  David   Male
5   5  Jenny Female
6   6   Hans   Male
7   7    Leo   Male
8   8   John   Male
9   9  Emily Female
10 10    Lee Female

Instead, I want to know which names go with female and which with male:

Female Male
Ashley Bob
Jenny James
Emily David
Lee Hans
Leo
John

There are plenty of functions that return the count of each (how many males, or how many James), but I've been unable to figure out how to just get the possible combinations.


Solution

  • Apparently you want this

    > split(df$name, df$gender) |> lapply(`length<-`, max(table(df$gender))) |> as.data.frame()
      Female  Male
    1 Ashley   Bob
    2  Jenny James
    3  Emily David
    4    Lee  Hans
    5   <NA>   Leo
    6   <NA>  John
    

    Based on your comment, here a small example with 10 groups instead of only 2:

    > set.seed(42)
    > df <- data.frame(name=LETTERS, group=sample.int(10, 26, replace=TRUE))
    > split(df$name, df$group) |> lapply(`length<-`, max(table(df$group))) |> as.data.frame()
        X1   X2   X3 X4   X5   X7   X8   X9  X10
    1    A    G    R  F    B    K    J    D    E
    2    C    Q <NA>  L    N <NA>    Z    M    H
    3    I    Y <NA>  O    V <NA> <NA>    S    P
    4 <NA> <NA> <NA>  U    W <NA> <NA>    T <NA>
    5 <NA> <NA> <NA>  X <NA> <NA> <NA> <NA> <NA>
    

    Also possible:

    > split(df, ~gender) |> lapply(`[`, 'name')
    $Female
         name
    2  Ashley
    5   Jenny
    9   Emily
    10    Lee
    
    $Male
       name
    1   Bob
    3 James
    4 David
    6  Hans
    7   Leo
    8  John
    

    maybe you want consider

    > split(df$name, df$gender)
    $Female
    [1] "Ashley" "Jenny"  "Emily"  "Lee"   
    
    $Male
    [1] "Bob"   "James" "David" "Hans"  "Leo"   "John" 
    

    or

    > split(df$name, df$gender) |> lapply(sort)
    $Female
    [1] "Ashley" "Emily"  "Jenny"  "Lee"   
    
    $Male
    [1] "Bob"   "David" "Hans"  "James" "John"  "Leo"  
    

    Data:

    > dput(df)
    structure(list(id = 1:10, name = c("Bob", "Ashley", "James", 
    "David", "Jenny", "Hans", "Leo", "John", "Emily", "Lee"), gender = c("Male", 
    "Female", "Male", "Male", "Female", "Male", "Male", "Male", "Female", 
    "Female")), class = "data.frame", row.names = c(NA, -10L))