regexregex-lookaroundsregex-negation

Regex to prevent certain ranges


I have currently the following regex

ZL[^0-9].{16}_.{3}PAD_N.{26}\.PIC

which matches filenames like

ZLF_1177_0771428479_534PAD_N0530130SALP09217_1100LMV01.PIC

but would like to change the regex so that the 9 characters at the position SALP09217 can not take the ranges

SALP00000-00899, SALP01000-03099 and SALP05000-06999

(note that SALT00000-00899, or any other substring other than SALP are allowed only those that start with SALP are to be excluded)

The following regex works partially

ZL[^0-9].{16}_.{3}PAD_N.{7}(?!(SALP00[0-8][0-9][0-9])|(SALP0[1-3]0[0-9][0-9])|(SALP0[5-6][0-9][0-9][0-9])).*\.PIC

but will allow strings larger than the original regex would allow. For example it allows

ZLF_1177_0771428479_534PAD_N0530130SALP09217_1100LMV01.PIC

which is correct but also

ZLF_1177_0771428479_534PAD_N0530130SALP09217_1100LMV01LARGER.PIC

which is not

The "ideal" regex would be

ZL[^0-9].{16}_.{3}PAD_N.{7}(?!(SALP00[0-8][0-9][0-9])|(SALP0[1-3]0[0-9][0-9])|(SALP0[5-6][0-9][0-9][0-9])).{10}\.PIC

but

ZLF_1177_0771428479_534PAD_N0530130SALP09217_1100LMV01.PIC

will not be a match.

Any suggestions?


Solution

  • The negative lookahead can be much simpler than Wiktor's answer.

    Given exclusion ranges:

    SALP00000-00899
    SALP01000-03099
    SALP05000-06999
    

    it is clear that all start SALP0 and end [0-9]{2}.

    Then the remaining two digits are:

    0[0-8]
    1[0-9] 2[0-9] 30
    5[0-9] 6[0-9]
    

    which can be regrouped:

    0[0-8]
    1[0-9] 2[0-9] 5[0-9] 6[0-9]
    30
    

    and combined into: 0[0-8]|[1256][0-9]|30.

    So the whole negative lookahead is just:

    (?!SALP0(0[0-8]|[1256][0-9]|30)[0-9]{2})
    

    It is incorporated by splitting .{26} in two at the appropriate offset, as stated. Note that lookarounds consume no characters, so total length does not change:

    ZL[^0-9].{16}_.{3}PAD_N.{7}(?!SALP0(?:0[0-8]|[1256][0-9]|30)[0-9]{2}).{19}\.PIC