I have a string where I wish to extract all the information before the first space and all the information after the first space.
myNames
"Brown R A" "Davis T" "Polter L"
To do this I have tried;
gsub("(.*) (.*)", "\\2 \\1, myNames)
This is fine for any names where I only have one first initial but doesnt perform correctly when we have more, for example here we get the first name returned as; "A Brown R" whereas I am trying to get "R A Brown"
I had a look at trying to split everything on the first white space using gsub("(.*)^\\s(.*)", "\\2 \\1, myNames)
but that didnt seem to change anything
Your first regex (.*)
is too greedy. Assuming last names have at least 2 characters, you can use the {n,}
repetition quantifier:
gsub("([a-z]{2,}) (.*)", "\\2 \\1", myNames, ignore.case=TRUE)
Which gives:
[1] "R A Brown" "T Davis" "L Polter"
And for names with apostrophes ('
), eg.
myNames <- c("Brown R A", "Davis T", "Polter L", "O'Brien M")
you may need:
gsub("(\\S{2,}) (.*)", "\\2 \\1", myNames, ignore.case=TRUE)
[1] "R A Brown" "T Davis" "L Polter" "M O'Brien"
which searches for anything that's not a space.