javaregexrecursive-backtracking

Is my regex catastrophically backtracking?


I use a regex to validate number formats.

[-+]?([0-90-9]+((\,([0-90-9]{2,}))*\,([0-90-9]{3}))*)?(\.[0-90-9]*)? 

When I handled a large number of inputs for certain inputs it seems to loop infinitely .I read other answers regarding catastrophic backtracking . But I am a regex newbie and need some help. can you please provide any input that can make this regex catastrophically backtrack . Would be helpful for me to understand .Thanks .It can be a very long input too . I am using Java Pattern and matcher objects.


Solution

  • Yes, this regex is prone to catastrophic backtracking. Specifically, this segment:

    ((\,([0-9]{2,}))*\,([0-9]{3}))*
    

    For reference, this has a structure of the form

    ((,d)*,d)*
    

    which, simplified, is essentially (d+)*.

    Strings like

    1,111,111,111,111,111,111,111,111,111,111,111,111,111,11.
    

    will therefore cause catastrophic backtracking.