Using the following expression:
(?<!XYZ\d{8})(?>REF[A-Z]*)?(\d{3}+)(\d{6}+)(\d{3}+)
I am getting unexpected matches. Please could you explain why the following matches occur:
XYZ12345678123456789123
- Matches on 123456781234
- I was expecting it to only match on 123456789123
because it is the only sequence not preceded by (?<!XYZ\d{8})
Weirdly enough, if i use XYZ12345678REF123456789876
as input, it returns a match on 123456789876
but not REF123456789876
. It correctly ignored the XYZ12345678
, but it didn't pick up the optional REF characters.
Basically what i want to achieve is to extract a 12 digit identifier from a string that contains two identifiers. The first identifier has the format XYZ\d{8}
and the second identifier has the format (?>REF[A-Z]*)?(\d{3}+)(\d{6}+)(\d{3}+)
To avoid a match on the wrong 12 digits in a string such as XYZ12345678123456789123
, i want to say - get the twelve digits as long as the digits are not part of an XYZ\d{8}
type identifier.
Here are a couple of examples of what i want to achieve
XYZ12345678123456789123 match on 123456789123
123456789123 match on 123456789123
XYZ12345678REF123456789123 should match on REF123456789123
12345678912 no match because not 12 digits
REF123456789123 match on REF123456789123
REF12345678912 no match because not 12 digits
XYZ12345678123456789123ABC match on 123456789123
XYZ123456789123 No match
XYZ1234567891234 no match
You ware almost there. Change (?<!XYZ\\d{8})
to (?<!XYZ\\d{0,7})
. You need to check if your match is not part of previous identifier XYZ\\d{8}
which means it cant have
XYZ
XYZ1
XYZ12
XYZ1234567
before it.
Demo based on your examples
String[] data ={
"XYZ12345678123456789123", //123456789123
"123456789123", //123456789123
"XYZ12345678REF123456789123 ", //REF123456789123
"12345678912", //no match because not 12 digits
"REF123456789123", //REF123456789123
"REF12345678912", //no match because not 12 digits
"XYZ12345678123456789123ABC", //123456789123
"XYZ123456789123", //no match
"XYZ1234567891234", //no match
};
Pattern p = Pattern.compile("(?<!XYZ\\d{0,7})(?>REF[A-Z]*)?(\\d{3}+)(\\d{6}+)(\\d{3}+)");
for (String s:data){
System.out.printf("%-30s",s);
Matcher m = p.matcher(s);
while (m.find())
System.out.print("match: "+m.group());
System.out.println();
}
output:
XYZ12345678123456789123 match: 123456789123
123456789123 match: 123456789123
XYZ12345678REF123456789123 match: REF123456789123
12345678912
REF123456789123 match: REF123456789123
REF12345678912
XYZ12345678123456789123ABC match: 123456789123
XYZ123456789123
XYZ1234567891234