I'm trying to get the list of all digits preceding a hyphen in a given string (let's say in cell A1
), using a Google Sheets regex formula :
=REGEXEXTRACT(A1, "\d-")
My problem is that it only returns the first match... how can I get all matches?
Example text:
"A1-Nutrition;A2-ActPhysiq;A2-BioMeta;A2-Patho-jour;A2-StgMrktg2;H2-Bioth2/EtudeCas;H2-Bioth2/Gemmo;H2-Bioth2/Oligo;H2-Bioth2/Opo;H2-Bioth2/Organo;H3-Endocrino;H3-Génétiq"
My formula returns 1-
, whereas I want to get 1-2-2-2-2-2-2-2-2-2-3-3-
(either as an array or concatenated text).
I know I could use a script or another function (like SPLIT
) to achieve the desired result, but what I really want to know is how I could get a re2 regular expression to return such multiple matches in a "REGEX.*
" Google Sheets formula.
Something like the "global - Don't return after first match" option on regex101.com
I've also tried removing the undesired text with REGEXREPLACE
, with no success either (I couldn't get rid of other digits not preceding a hyphen).
Any help appreciated! Thanks :)
=regexreplace(A1,".?(\d-)|.", "$1")
Which returns 1-2-2-2-2-2-2-2-2-2-3-3-
for "A1-Nutrition;A2-ActPhysiq;A2-BioMeta;A2-Patho-jour;A2-StgMrktg2;H2-Bioth2/EtudeCas;H2-Bioth2/Gemmo;H2-Bioth2/Oligo;H2-Bioth2/Opo;H2-Bioth2/Organo;H3-Endocrino;H3-Génétiq"
.
Explanation of regex:
.?
-- optional character(\d-)
-- capture group 1 with a digit followed by a dash (specify (\d+-)
multiple digits)|
-- logical or.
-- any character"$1"
uses just the capture group 1, and discards anything elseLearn more about regex: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex