rubyarraysanagram

Comparing two arrays containing strings for anagrams in Ruby


Forgive me if my code is off. I've still got my head in Ruby on Rails which seems to have subtle differences that are coming out as I learn more "just Ruby," although to be fair I'm not sure my code would pass muster in a Ruby on Rails format. I digress.

I am trying to compare two arrays that contain a set of strings. I want to do a couple things. 1) Ensure that the arrays are the same number of words, otherwise the exercise is moot. 2) Compare the first word in an array with only the first word in the second array. In other words, I never want to compare word 1 in array "a" with word 4 in array "b". I'm struggling to find a solution that reorders the characters in any given word, compares it to the reordered characters in the corresponding word in the second array, and prints 1 if it's an anagram (once sorted, the idea is that the two words would be equivalent) or 0 if they do not match.

In the example below, what I want it to print is:

0
0
1
1

...but that is not happening. Thoughts? I'm afraid this has to do with local variable issues, but I am not be sure.

    a = ['hello', 'goodbye', 'pants', 'baa']
    b = ['helio', 'godbye', 'spant', 'aba']

    x = a.length
    y = b.length
    z = 0

    x = y? do
        while z < x do
            if a.find(z).chars.sort.join == b.find(z).chars.sort.join
                puts 1
            else
                puts 0
            end

            z += 1
        end
    end

Solution

  • [Edit: I've edited my answer to incorporate an efficiency improvement suggested by @raph in a comment on the question (the method anagram? below). That may not be necessary, but I thought it was such a good idea that it should get some exposure. I've also given a detailed explanation, as the OP is new to Ruby, as might be other readers.]

    You might consider doing it as follows.

    Code

    def anagrams(a, b)
      return nil unless a.size == b.size
      a.zip(b).map { |aw,bw| anagram?(aw,bw) ? 1 : 0 }
    end
    
    def anagram?(aw, bw)
      return false unless aw.size == bw.size
      counts = aw.downcase.each_char.with_object(Hash.new(0)) { |c,h| h[c] += 1 }
      bw.downcase.each_char do |c|
        return false unless counts[c] > 0
        counts[c] -= 1
      end
      true
    end
    

    Example

    a = ['hello', 'goodbye', 'pants', 'baa']
    b = ['helio', 'godbye', 'Spant', 'aba']
    anagrams(a, b)
      #=> [0, 0, 1, 1]
    

    Explanation

    anagrams method

    For the example above,

    a.size #=> 4
    b.size #=> 4
    

    so we don't return nil in the first line of anagrams.

    Next,

    c = a.zip(b)
      #=> [["hello", "helio"], ["goodbye", "godbye"],
      #    ["pants", "Spant"], ["baa", "aba"]]
    

    Assuming for a moment that anagram? works as desired:

    c.map { |e| anagram?(e.first, e.last) ? 1 : 0 }
      #=> [0, 0, 1, 1]
    

    Enumerable#map passes each element of c (a two-element array) into the block.1. It is clearer, however, to decompose (or "disambiguate") those arrays and assign each of the two words they comprise to a block variable2:

    c.map { |aw,bw| anagram?(aw,bw) ? 1 : 0 }
      #=> [0, 0, 1, 1]
    

    The first element passed in is ["hello", "helio"], so

    aw => "hello"
    bw #=> "helio"
    

    and we execute

    anagram?("hello", "helio") ? 1 : 0
      #=> 0
    

    which is shorthand for

    if anagram?("hello", "helio")
      1
    else
      0
    end
      #=> 0
    

    anagram? method

    So now let's move on to anagram?, with

    aw = "hello"
    bw = "helio"
    

    Since

    aw.size == bw.size #=> true
    

    we don't return.

    Count frequency of letters in the first word

    Let me now write the next few lines of anagram? slightly differently:

    counts = Hash.new(0)
      #=> {}
    aw_down = aw.downcase 
      #=> "hello"
    aw_down.each_char { |c| counts[c] += 1 }
      #=> "hello"
    counts
      #=> {"h"=>1, "e"=>1, "l"=>2, "o"=>1}
    

    (The last line is there just to show the value of the hash.)

    In the first line we create a hash counts with a default value of zero. All this means is that if counts does not contain the key k, counts[k] will return the default value. Very important: doing so does not change the hash!3

    String#each_char4 passes each character of "hello" into the block and assigns it to the block variable c. Initially, c='h' and h={}. We then execute

    counts['h'] += 1
    

    which is shorthand for

    counts['h'] = counts['h'] + 1
    

    Since counts does not yet have a key 'h', counts['h'] on the right returns the default value:

    counts['h'] = 0 + 1 #=> 1
    counts #=> {"h"=>1}
    

    Similarly, after 'e' and the first 'l' are passed to the block, we have:

    counts #=> {"h"=>1, "e"=>1, "l"=>1} 
    

    However, when we pass the second 'l', we execute

    counts['l'] = counts['l'] + 1
      #=>    1 + 1
      #=> 2
    

    and we finish up with

    counts #=> {"h"=>1, "e"=>1, "l"=>2, "o"=>1}
    

    The method Enumerable#each_with_object will become a good friend

    This method is used merely to save some steps. It allows us to write:

    counts = Hash.new(0)
    aw_down.each_char { |c| counts[c] += 1 }
    

    as

    counts = aw_down.each_with_object(Hash.new(0)) { |c,h| h[c] += 1 }
    

    and we can also get rid of the line

    aw_down = aw.downcase 
    

    by writing

    counts = aw.downcase.each_char.with_object(Hash.new(0)) { |c,h| h[c] += 1 }
    

    This may seem like a small saving, but there are many other situations where the use of each_with_object and other Enumerable class methods permit the chaining of methods, which is extremely useful.

    Decrementing letter counts for letters in the second word

    Recall

    counts #=> {"h"=>1, "e"=>1, "l"=>2, "o"=>1}
    

    We now execute

    bw_down = bw.downcase
      #=> "helio"
    "helio".each_char do |c|
      return false unless counts[c] > 0
      counts[c] -= 1
    end
    

    First, 'h' is passed into the block. As counts['h'] #=> 1, we execute counts['h'] -= 1, so now

    counts #=> {"h"=>0, "e"=>1, "l"=>2, "o"=>1}`.
    

    After passing 'e' and 'l' to the block,

    counts #=> {"h"=>0, "e"=>0, "l"=>1, "o"=>1}
    

    but when we pass 'i', we find

    counts['i'] #=> 0
    

    (i.e., the default value of zero is returned, and we don't want to set counts['i'] to -1) so we return false, having concluded that the two words are not anagrams. (Had the second word been "heeio", we would have returned false when the second 'e' was passed to the block.)

    Do we have an anagram?

    Since two two words have the same length, if we are able to process all characters of the second word without returning false, we must end up with

    counts #=> {"h"=>0, "e"=>0, "l"=>0, "o"=>0}
    

    (no need to check!), meaning the two words are anagrams, so in this case we would return true to anagrams.5 Hence, the last line of anagram?.

    Notes

    1 Under the hood, this is what's happening:

    enum = c.map
      #=> #<Enumerator: [["hello", "helio"], ["goodbye", "godbye"],
      #                  ["pants", "Spant"], ["baa", "aba"]]:map>
    

    Here we can see what elements the enumerator will pass into the block, but sometimes you need to convert the enumerator to an array to get that information:

    enum.to_a
      #=> [["hello", "helio"], ["goodbye", "godbye"],
      #    ["pants", "Spant"], ["baa", "aba"]]
    

    It is actually the method Array#each that passes the elements of enum into the block:

    enum.each { |aw,bw| anagram?(aw,bw) ? 1 : 0 }
      #=> [0, 0, 1, 1]
    

    2 If we pass [[1,2],3] into a block, and the block variables are written |(a,b),c|, then a=>1, b=>2, c=>3. This is quite handy. Cool, eh?.

    3

    h = Hash.new('pig')
    h['dog'] = 7 #=> 7
    h            #=> {"dog"=>7}
    h[0]         #=> "pig"
    h['cat']     #=> "pig"
    h[{:a=>1}]   #=> "pig"
    h            #=> {"dog"=>7}
    

    Note there is a form of Hash#new that takes block, which allows keys not in the hash to be added when they are referenced.

    4 Instead of aw_down.each_char we could have written aw_down.chars.each, but aw_down.chars creates an unnecessary intermediate array. each_char, an enumerator, merely passes values as they are required.

    5 We could return 0 rather than false and 1 rather than true, in which case we could write

    a.zip(b).map { |aw,bw| anagram?(aw,bw) }
    

    in anagrams, but wouldn't it be clearer to have anagrams return an array whose values are true or false, rather than 0 or 1?