regexregex-groupregex-greedy

Regex Quantifier Which Number of Occurrence Gets Tested First?


I'm completely new to regex and recently started learning it. Here's a part of my test string from which I'd like to find matches.

24 bit:
Black #000000
12 bit:
Black #000

My question is the following. When I use regex expression #(\w{1,2}), the group matches 00 in both 24-bit Black and 12-bit Black. However when I use regex #(\w{1,2})\1\1, the group matches 00 in 24-bit Black but 0 in 12-bit Black. Although I'm not familiar with how regex works, I'm curious what's the logic behind this. When I use curly braces quantifier {a,b} to indicate a <= (# occurrences) <= b, for the numbers a, a+1,...,b, which one is used to check for matching first? For example, with #(\w{1,2}) it seems 2 occurrences is used first. But after adding \1\1, it seems to me somehow regex was able to see that using 1 occurrence instead of 2 would result in matching 12-bit Black?


Solution

  • The pattern #(\w{1,2})\1\1 can match #000000 and #000 because \w{1,2} can backtrack 1 position to fit in the matches for the backreferences \1\1

    You make the pattern a bit more specific

    #([0-9a-fA-F]{1,2})\1\1
    

    Or if there should be no surrounding non whitespace characters:

    (?<!\S)#([0-9a-fA-F]{1,2})\1\1(?!\S)
    

    See a regex101 demo.