rsorting

Sort column in R: strings first (alphabetically), then numbers (numerically)


I have a dataframe in R with a column consisting of both letters and numbers (eg. "A", "B", "1", "2" etc). I would like to sort the dataframe in such a way that I have the letters first (alphabetically sorted) followed by the numbers (numerically sorted). Ideally in a tidyverse way, but not necessarily.

gtools::mixedsort does almost what I want, but puts numbers before strings and I do not think there is an argument that allows you to push the numbers to the back.

I considered splitting the dataframe, sort each one separately, and then bind by rows again. But I am guessing there should be a better way to do this?

Here is also an example to further clarify my question.

I have:

Col1    Col2   Col3
Apples     A     90
Pears     12     90
Bananas    C     50
Cake       1     50
Apples     A     90
Pears      B     90
Bananas    2     50
Cake     100     50

What I try to achieve is sorting by Col2, alphabetically first, then numerically:

Col1    Col2   Col3
Apples     A     90
Apples     A     90
Apples     A     90
Apples     A     90
Pears      B     90
Bananas    C     50
Cake       1     50
Bananas    2     50
Pears     12     90
Cake     100     50

Many thanks!


Solution

  • For a base R option:

    df <- data.frame(Col2=c("100", "B", "A", "Z", "10", "4"), stringsAsFactors=FALSE)
    df[order(grepl("^\\d+$", df$Col2), sprintf("%10s", df$Col2)), ]
    
    [1] "A"   "B"   "Z"   "4"   "10"  "100"
    

    The two sorting levels here first place letters before numbers. The second sorting level left pads everything to 10 characters with zeroes. Then it sorts ascending. This is effectively an ascending numeric sort for the numbers. The trick here is to realize that number strings actually do sort correctly as text if they all have the same width.