I looked into the LeetCode problem 49. Group Anagrams:
Given an array of strings
strs
, group the anagrams* together. You can return the answer in any order.* An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, using all the original letters exactly once.
Here's my solution in PHP, but it doesn't pass all tests. For instance, it fails for the input ["ddddddddddg","dgggggggggg"]
. Also, it becomes too slow when the input array is large.
Most examples I found are in Python and so much shorter using Sets.
What can I do to make this PHP solution work and more efficiently so?
class Solution {
/**
* @param String[] $strs
* @return String[][]
*/
function groupAnagrams($strs) {
$result = [];
$alreadyPlaced = [];
$currentAnagrams = [];
$hashed = [];
foreach($strs as $key => $str) {
// create assoc array mapping letter to index of str its from
if ( array_key_exists($key, $alreadyPlaced) ) continue;
// mark this as placed
$alreadyPlaced[$key] = true;
// initiatilize new array and place into new array
$currentAnagrams = array();
array_push($currentAnagrams, $str);
// to be able to determine anagrams, hash each letter in the newly found string
$hashed = array();
foreach (str_split($str) as $c) {
$hashed[$c] = true;
}
//loop thru all strings skipping those of not same length or already accounted for
foreach($strs as $checkKey => $checkVal) {
$breakout = false;
if (mb_strlen($checkVal) != mb_strlen($str)) continue;
if (array_key_exists($checkKey, $alreadyPlaced)) continue;
foreach (str_split($checkVal) as $c) {
if (!$hashed[$c]) {
$breakout = true;
continue;
}
}
if ($breakout) continue;
// if it got here, its an anagram
array_push($currentAnagrams, $checkVal);
$alreadyPlaced[$checkKey] = true;
}
//add new palindrome array into $result array
array_push($result, $currentAnagrams);
}
return $result;
}
Your code does not solve the problem: it doesn't account for duplicate letters.
For instance, take this input:
["aaabb","aaaab","abbbb","aabbb","ababa"]
It is expected to return an array with four subarrays, but your code puts all strings in the same subarray.
It is not enough to set a boolean for each unique letter in a string and use that as a hash. That doesn't account for how many times a letter occurs, and will give false positives.
You could use array_count_values
to count the occurrences of each letter and produce an associative array with letters as keys and counts as values.
Then sort this associative array by its keys -- which takes constant time, since there are at most 26 entries in it.
Finally encode this associative array as JSON (or use any other serialisation function, like serialize
), which uniquely defines the group a string belongs to, and can be used as key in the result array.
function groupAnagrams($strs) {
$groups = [];
foreach ($strs as $str) {
$count = array_count_values(str_split($str));
ksort($count); // As there are at most 26 entries, this is O(1)
$groups[json_encode($count)][] = $str; // Serialize the key
}
return $groups;
}