javaregexstring

Regular expression - Negative lookahead


Using the following expression:

(?<!XYZ\d{8})(?>REF[A-Z]*)?(\d{3}+)(\d{6}+)(\d{3}+)

I am getting unexpected matches. Please could you explain why the following matches occur:

Weirdly enough, if i use XYZ12345678REF123456789876 as input, it returns a match on 123456789876 but not REF123456789876. It correctly ignored the XYZ12345678, but it didn't pick up the optional REF characters.

Basically what i want to achieve is to extract a 12 digit identifier from a string that contains two identifiers. The first identifier has the format XYZ\d{8} and the second identifier has the format (?>REF[A-Z]*)?(\d{3}+)(\d{6}+)(\d{3}+)

To avoid a match on the wrong 12 digits in a string such as XYZ12345678123456789123, i want to say - get the twelve digits as long as the digits are not part of an XYZ\d{8} type identifier.

Edit

Here are a couple of examples of what i want to achieve

XYZ12345678123456789123  match on 123456789123
123456789123 match on 123456789123
XYZ12345678REF123456789123 should match on REF123456789123
12345678912 no match because not 12 digits
REF123456789123 match on REF123456789123
REF12345678912 no match because not 12 digits
XYZ12345678123456789123ABC match on 123456789123
XYZ123456789123  No match
XYZ1234567891234  no match

Solution

  • You ware almost there. Change (?<!XYZ\\d{8}) to (?<!XYZ\\d{0,7}). You need to check if your match is not part of previous identifier XYZ\\d{8} which means it cant have

    before it.


    Demo based on your examples

    String[] data ={
            "XYZ12345678123456789123",          //123456789123
            "123456789123",                     //123456789123
            "XYZ12345678REF123456789123 ",      //REF123456789123
            "12345678912",                      //no match because not 12 digits
            "REF123456789123",                  //REF123456789123
            "REF12345678912",                   //no match because not 12 digits
            "XYZ12345678123456789123ABC",       //123456789123
            "XYZ123456789123",                  //no match
            "XYZ1234567891234",                 //no match
    };
    
    
    Pattern p = Pattern.compile("(?<!XYZ\\d{0,7})(?>REF[A-Z]*)?(\\d{3}+)(\\d{6}+)(\\d{3}+)");
    for (String s:data){
        System.out.printf("%-30s",s);
        Matcher m = p.matcher(s);
        while (m.find())
            System.out.print("match: "+m.group());
        System.out.println();
    }
    

    output:

    XYZ12345678123456789123       match: 123456789123
    123456789123                  match: 123456789123
    XYZ12345678REF123456789123    match: REF123456789123
    12345678912                   
    REF123456789123               match: REF123456789123
    REF12345678912                
    XYZ12345678123456789123ABC    match: 123456789123
    XYZ123456789123               
    XYZ1234567891234