javastringcompareaccent-insensitive

Compare strings ignoring accented characters


I would like to know if there is a method that compares 2 strings and ignores the accents making "noção" equal to "nocao". it would be something like string1.methodCompareIgnoreAccent(string2);


Solution

  • You can use java Collators for comparing the tests ignoring the accent and case, see a simple example:

    import java.text.Collator;
    
    /**
     * @author Kennedy
     */
    public class SimpleTest
    {
    
      public static void main(String[] args)
      {
        String a = "nocao";
        String b = "noção";
    
        final Collator instance = Collator.getInstance();
    
        // This strategy mean it'll ignore the accents and the case
        instance.setStrength(Collator.PRIMARY);
    
        // Will print 0 because its EQUAL
        System.out.println(instance.compare(a, b));
      }
    }
    

    Documentation: JavaDoc

    Be aware that this collator also ignores differences in case, i.e. it also treats "NOCAO" as equal to "noção". To create a collator that ignores accent differences but distingishes case, you might be able to use a RuleBasedCollator

    Do not confuse Collator.setStrength() with Collator.setDecomposition(). The Collator constants PRIMARY, SECONDARY, TERTIARY and IDENTICAL must only be used with setStrength(), while the constants NO_DECOMPOSITION, CANONICAL_DECOMPOSITION and FULL_DECOMPOSITION must only be used with setDecomposition(). (A previous version of this code mixed this up and only worked because NO_DECOMPOSITION and PRIMARY happen to have the same integer value.)