rregexstring

Using gsub to extract all information before and after the first white space


I have a string where I wish to extract all the information before the first space and all the information after the first space.

myNames
"Brown R A" "Davis T" "Polter L"

To do this I have tried;

gsub("(.*) (.*)", "\\2 \\1, myNames)

This is fine for any names where I only have one first initial but doesnt perform correctly when we have more, for example here we get the first name returned as; "A Brown R" whereas I am trying to get "R A Brown"

I had a look at trying to split everything on the first white space using gsub("(.*)^\\s(.*)", "\\2 \\1, myNames) but that didnt seem to change anything


Solution

  • Your first regex (.*) is too greedy. Assuming last names have at least 2 characters, you can use the {n,} repetition quantifier:

    gsub("([a-z]{2,}) (.*)", "\\2 \\1", myNames, ignore.case=TRUE)
    

    Which gives:

    [1] "R A Brown" "T Davis"   "L Polter"
    

    And for names with apostrophes ('), eg.

    myNames <- c("Brown R A", "Davis T", "Polter L", "O'Brien M")
    

    you may need:

    gsub("(\\S{2,}) (.*)", "\\2 \\1", myNames, ignore.case=TRUE)
    
    [1] "R A Brown" "T Davis"   "L Polter"  "M O'Brien"
    

    which searches for anything that's not a space.