excelexcel-formula

Extract specific value within cell (RegEx Pattern)


maybe someone of you can help me or maybe point me in the right direction how to solve this problem in excel.

I have a sheet with cells containing mixed values like:

texxtext W.1234.56 text

text text U.3853.23

P.3853.34 texttext

I just need the specific ArticleNumber (X.XXXX.XX) in between. Like this RegEx Pattern ([A-Za-z].\d\d\d\d.\d\d)

Is there a way to achieve this with Formular Functions?

Thanks in advance!

[A-Za-z].\d\d\d\d.\d\d


Solution

  • PY():

    If applicable, you can use the PY() function for this:

    enter image description here

    Formula in B1:

    =PY(
        xl("A1:A3").replace(r'.*\b([A-Za-z]\.\d{4}\.\d{2})\b.*', r'\1', regex=True)[0].values
    )
    

    At time of writing PY() is only available within the beta-channel of a ms365 subscription.


    SEARCH():

    Otherwise, try something more simple first to see if it would be sufficient to use wildcard in a regular SEARCH() pattern:

    =MID(A1:A3,SEARCH("?.????.??",A1:A3),9)
    

    With this method I assume there is no, or a very slim, chance that there are other substring in your text that follow that exact pattern.


    FILTERXML():

    Another, more intricate way would be to use FILTERXML(). That way you can utilize xpath-patterns that can check for the pattern at hand:

    =FILTERXML("<t><s>"&SUBSTITUTE(A1:A3," ","</s><s>")&"</s></t>","//s[string-length()=9][concat(substring(., 2, 1),substring(.,7,1))='..'][translate(substring(.,1,1),'ABCDEFGHIJKLMNOPQRSTUVWXYZ','')=''][concat(substring(.,2,4),substring(.,8))*0=0]")
    

    Here the and-ing structure of predicates mean:

    Here I have assumed a space-delimited string to begin with. We use the spaces to split the text into nodes. This is not as straightforward nomore if punctuation comes into play.

    ‡‡ More information on FILTERXML() here