javaarrayscollectionshashmapanagram

How to group anagrams using collections efficiently?


I'm working on a Java problem where I need to group words that are anagrams of each other. For instance, given the input array:

["eat", "tea", "tan", "ate", "nat", "bat"]

The expected output should group anagrams together like this:

[["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]

The grouped words can be in any order, but anagrams must be together in a sub-list.

I attempted the following approach:

Here is the rough code I tried:

Map<String, List<String>> map = new HashMap<>();
for (String word : input) {
  char[] chars = word.toCharArray();
  Arrays.sort(chars);
  String key = new String(chars);
  map.computeIfAbsent(key, k -> new ArrayList<>()).add(word);
}
List<List<String>> result = new ArrayList<>(map.values());

This approach works for basic cases, but I faced issues when duplicate entries are present or when empty strings are included in the input. I also want to understand if there's a more efficient or cleaner way to do this using Java collections.

I looked for solutions using TreeMap, Streams, and even PriorityQueue, but most examples were simplistic and didn't explain handling duplicates or edge cases.


Solution

  • You could consider Stream handling to read a list of words and collect into sets of matching anagram. You could filter out names with leading/trailing whitespace and empty strings if required.

    Collecting with Collectors.groupingBy for each sorted fields can be used to collect related words. Using Collectors.toSet() is more suitable than ArrayList for collecting matching words, as it will eliminate duplicates. For example:

    List<String> input = List.of("eat", "tea", "tan", "ate", "nat", "bat", "", " ", "eat", "tea", "tan", "ate", "nat", "bat", "");
    
    Collection<Set<String>> anagrams = input.stream()
            // Optional: strip whitespace:
            .map(String::trim)
            // Optional: eliminate empty String
            .filter(Predicate.not(String::isEmpty))
            // group by sorted word values, and collect results to a set
            .collect(Collectors.groupingBy(word -> {
                    char[] chars = word.toCharArray();
                    Arrays.sort(chars);
                    return new String(chars);
                }, Collectors.toSet()))
            .values();
    System.out.println(anagrams);
    

    Prints:

    [[tea, ate, eat], [bat], [tan, nat]]