google-sheetsarray-formulas

How to split array of strings by a delimiter with more than 1 characters


I have a column of strings of the format "#.# km, # mins" in column F. (The spaces are literal. # is for numeric. The number of #s is not literal.) For example,

F
21.2 km, 27 mins

I understand that the following formula can separate the km part and the mins part.

=let(rmLspace,lambda(str,if(left(str,1)<>" ",str,right(str,len(str)-1)))
    ,ARRAYFORMULA(if(isblank(F2:F),,rmLspace(split(F2:F,",")))))

For example,

K J
21.2 km 27 mins

But the above solution leaves something to be desired due to 1) its complexity, 2) reliance on the 2nd delimiter character being a non-word character. What I mean is, " " is not a part of any potential words that carry meaning in the use case. Had the delimiter been ",a", the formula would have to be modified to something even more complex than it already is.

How do you split string in a cell by more than 1 character in a way that can be applied to a whole array via arrayformula()?

The simpler the formula is, the better, because my intended use cases are meant for very quick manipulation of text for visual purposes, and not mission critical. I would appreciate a simpler method for use in the future.


Solution

  • Assuming that the separator is always a comma and a space, you can split by those two characters like this:

    =arrayformula(iferror(split(F2:F, ", ", false)))
    

    You can also use capture groups, like this:

    =arrayformula(regexextract(tocol(F2:F, 1), "([\d.]+ km), ([\d.]+ mins)"))
    

    To get numeric results sans the km and mins, use this:

    =arrayformula(value(regexextract(tocol(F2:F, 1), "([\d.]+)[^\d]+([\d.]+)")))
    

    See split(), regular expressions quickstart, regexextract(), RE2, value() and tocol().