javaregexhung

Java Regex causes hung thread


Pattern:

"(([^",\n  ]*[,\n  ])*([^",\n  ]*"{2})*)*[^",\n  ]*"[  ]*,[  ]*|[^",\n]*[  ]*,[  ]*|"(([^",\n  ]*[,\n  ])*([^",\n  ]*"{2})*)*[^",\n  ]*"[  ]*|[^",\n]*[  ]*

This Regex is for parsing CSV file. But when it goes into Pattern.matcher, I encounter a hung thread exception. Appreciate it if someone can help fine tune this pattern.

[7/1/13 16:45:26:745 GMT+08:00] 00000029 ThreadMonitor W   WSVR0605W: Thread "MessageListenerThreadPool : 0" (00000035) has been active for 691836 milliseconds and may be hung.  There is/are 1 thread(s) in total in the server that may be hung.
at java.util.regex.Pattern$Curly.match(Pattern.java:4233)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4606)
at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4752)
at java.util.regex.Pattern$Prolog.match(Pattern.java:4689)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4606)
at java.util.regex.Pattern$Loop.match(Pattern.java:4733)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4665)
at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4754)
at java.util.regex.Pattern$Prolog.match(Pattern.java:4689)
at java.util.regex.Pattern$Loop.match(Pattern.java:4742)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4665)
at java.util.regex.Pattern$BitClass.match(Pattern.java:2912)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4278)
at java.util.regex.Pattern$Curly.match(Pattern.java:4233)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4606)
at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4752)
at java.util.regex.Pattern$Prolog.match(Pattern.java:4689)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4606)

Solution

  • Description

    The problem appears to be the shear amount of back tracking being done to accomplish the match.

    If your CSV is well formed you could use a more simple regex to parse each line. Note this will only separate the quote-comma and comma delimited values from a string, so you'd need to pass each line through the .matcher with this regex and iterate over each of the matches.

    regex: (?:^|,)"?((?<=")[^"]*|[^,"]*)"?(?=,|$)

    enter image description here

    Java Code Example:

    Live example: http://ideone.com/NBmzrk

    Sample Text

    "root",test1,1111,"22,22",,fdsa
    

    Code

    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    class Module1{
      public static void main(String[] asd){
      String sourcestring = "source string to match with pattern";
      Pattern re = Pattern.compile("(?:^|,)\"?((?<=\")[^\"]*|[^,\"]*)\"?(?=,|$)",Pattern.CASE_INSENSITIVE);
      Matcher m = re.matcher(sourcestring);
      int mIdx = 0;
        while (m.find()){
          for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
            System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
          }
          mIdx++;
        }
      }
    }
    

    Capture Group 1

    [0] => root
    [1] => test1
    [2] => 1111
    [3] => 22,22
    [4] => 
    [5] => fdsa