javaregexperformancet9

T9 Algorithm too slow


I am trying to implement T9 to my android dialer. But it lags. I checked and generating the list of possible combinations is painless.

But I am trying to match using a pattern and I generated the pattern as follows

Pattern queryPattern;
List<String> names = T9Utils.possibleNames(query);
StringBuilder sb = new StringBuilder();
for (String name : names) {
    Matcher m = p.matcher(name.toLowerCase());
    sb.append("(");
    sb.append(m.replaceAll("($0)\\s*"));
    sb.append(")");
    if (!name.equals(names.get(names.size() - 1))) {
        sb.append("|");
    }
}
queryPattern = Pattern.compile("(?i).*(" + sb.toString() + ").*");
for (CallLogItem contact : allContacts) {
   Matcher nameM = queryPattern.matcher(contact.displayName);
   Matcher phoneM = queryPattern.matcher(contact.phoneNumber);

   if (nameM.matches()) {
      //TODO: Highlighting
      toBeDisplayed.add(contact);
    } else if (phoneM.matches()) {
      toBeDisplayed.add(contact);
    }
}

The pattern generated would be something like

(?i).*(((g)\\s*(d)\\s*)|((g)\\s*(e)\\s*)|((g)\\s*(f)\\s*)|((g)\\s*(3)\\s*)|((h)\\s*(d)\\s*)|((h)\\s*(e)\\s*)|((h)\\s*(f)\\s*)|((h)\\s*(3)\\s*)|((i)\\s*(d)\\s*)|((i)\\s*(e)\\s*)|((i)\\s*(f)\\s*)|((i)\\s*(3)\\s*)|((4)\\s*(d)\\s*)|((4)\\s*(e)\\s*)|((4)\\s*(f)\\s*)|((4)\\s*(3)\\s*)).*

Solution

  • First, you should use a character class instead of all these alternations.

    Then remove the whitespaces of the string you compare to. Take your name as an example: Amanuel Nega => AmanuelNega. You should use this conversion table (adapt it as needed):

    Input   | Corresponding class
    ---------------------------
    1       | [1.!]
    2       | [2abc]
    3       | [3def]
    4       | [4ghi]
    5       | [5klm]
    6       | [6nop]
    7       | [7qrst]
    8       | [8uvw]
    9       | [9xyz]
    0       | [0+]
    

    Let's say I type 4 followed by a 3 (which is what your output suggest).

    Then the output will be [4ghi][3def].

    If I want to match AmanuelNega I'll have to type 2526835 which will produce the regex:

    [2abc][5klm][2abc][6nop][8uvw][3def][5klm]
    

    Which will be waaaay faster than what you already have