rdataframealphabetical

Creating new column by combining two columns, but variables need to be in alphabetical order


I have the following dataset:

ID<-c("A","B","C","D","E")
Fruits1<-c("orange","apple","pineapple","apple","pineapple")
Fruits2<-c("apple","orange","apple","pineapple","orange")
data<-tibble(ID,Fruits1,Fruits2)
data
# A tibble: 5 × 3
ID    Fruits1   Fruits2  
<chr> <chr>     <chr>    
1 A     orange    apple    
2 B     apple     orange   
3 C     pineapple apple    
4 D     apple     pineapple
5 E     Pineapple orange  

I want to create a new column called FruitsDiff, which combines columns Fruits1 and Fruits2 like so:

# A tibble: 5 × 4
ID    Fruits1   Fruits2   FruitsDiff      
<chr> <chr>     <chr>     <chr>           
1 A     orange    apple     apple-orange    
2 B     apple     orange    apple-orange    
3 C     pineapple apple     apple-pineapple 
4 D     apple     pineapple apple-pineapple 
5 E     pineapple orange    orange-pineapple

My requirement for this new column is that the fruits be ordered alphabetically, regardless of which fruit comes first in the data frame (so for example, for line 1, even if orange comes before apple, the variable in FruitsDiff is apple-orange). I usually use paste() to combine columns, but this is not going to help me in this situation.

Any suggestions? Also, the separator between the two variables can be anything, I just used a dash for the sake of the example.


Solution

  • We get the elementwise min/max between the 'Fruit' columns with pmin/pmax (where the min/max will be based on alphabetic order) and paste (str_c) the output from those functions to create the new column 'FruitsDiff'

    library(dplyr)
    library(stringr)
    data %>% 
     mutate(FruitsDiff = str_c(pmin(Fruits1, Fruits2), 
          pmax(Fruits1, Fruits2), sep = '-'))
    

    -output

    # A tibble: 5 × 4
      ID    Fruits1   Fruits2   FruitsDiff      
      <chr> <chr>     <chr>     <chr>           
    1 A     orange    apple     apple-orange    
    2 B     apple     orange    apple-orange    
    3 C     pineapple apple     apple-pineapple 
    4 D     apple     pineapple apple-pineapple 
    5 E     pineapple orange    orange-pineapple
    

    Or in base R, use the same pmin/pmax to paste

    data$FruitsDiff <- with(data, paste(pmin(Fruits1, Fruits2), 
          pmax(Fruits1, Fruits2), sep = '-'))
    

    Or with apply and MARGIN = 1, loop over the rows, sort the elements and paste with collapse as argument

    data$FruitsDiff <-  apply(data[-1], 1, \(x) paste(sort(x), collapse = '-'))