rubyenumerator

How does Ruby Enumerators chaining work exactly?


Consider the following code:

[1,2,3].map.with_index { |x, i| x * i }
# => [0,2,6]

How does this work exactly?

My mental model of map is that it iterates and apply a function on each element. Is with_index somehow passing a function to the enumerator [1,2,3].map, in which case what would that function be?

This SO thread shows how enumerators pass data through, but doesn't answer the question. Indeed, if you replace map with each then the behaviour is different:

[1,2,3].each.with_index { |x, i| x * i }
# => [1,2,3]

map seems to carry the information that a function has to be applied, on top of carrying the data to iterate over. How does that work?


Solution

  • Todd's answer is excellent, but I feel like seeing some more Ruby code might be beneficial. Specifically, let's try to write each and map on Array ourselves.

    I won't use any Enumerable or Enumerator methods directly, so we see how it's all working under the hood (I'll still use for loops, and those technically call #each under the hood, but that's only cheating a little)

    First, there's each. each is easy. It iterates over the array and applies a function to each element, before returning the original array.

    def my_each(arr, &block)
      for i in 0..arr.length-1
        block[arr[i]]
      end
      arr
    end
    

    Simple enough. Now what if we don't pass a block. Let's change it up a bit to support that. We effectively want to delay the act of doing the each to allow the Enumerator to do its thing

    def my_each(arr, &block)
      if block
        for i in 0..arr.length-1
          block[arr[i]]
        end
        arr
      else
        Enumerator.new do |y|
          my_each(arr) { |*x| y.yield(*x) }
        end
      end
    end
    

    So if we don't pass a block, we make an Enumerator that, when consumed, calls my_each, using the enumerator yield object as a block. The y object is a funny thing but you can just think of it as basically being the block you'll eventually pass in. So, in

    my_each([1, 2, 3]).with_index { |x, i| x * i }
    

    Think of y as being like the { |x, i| x * i } bit. It's a bit more complicated than that, but that's the idea.

    Incidentally, on Ruby 2.7 and later, the Enumerator::Yielder object got its own #to_proc, so if you're on a recent Ruby version, you can just do

    Enumerator.new do |y|
      my_each(arr, &y)
    end
    

    rather than

    Enumerator.new do |y|
      my_each(arr) { |*x| y.yield(*x) }
    end
    

    Now let's extend this approach to map. Writing map with a block is easy. It's just like each but we accumulate the results.

    def my_map(arr, &block)
      result = []
      for i in 0..arr.length-1
        result << block[arr[i]]
      end
      result
    end
    

    Simple enough. Now what if we don't pass a block? Let's do the exact same thing we did for my_each. That is, we're just going to make an Enumerator and, inside that Enumerator, we call my_map.

    def my_map(arr, &block)
      if block
        result = []
        for i in 0..arr.length-1
          result << block[arr[i]]
        end
        result
      else
        Enumerator.new do |y|
          my_map(arr) { |*x| y.yield(*x) }
        end
      end
    end
    

    Now, the Enumerator knows that, whenever it eventually gets a block, it's going to use my_map on that block at the end. We can see that these two functions actually behave, on arrays, like map and each do

    my_each([1, 2, 3]).with_index { |x, i| x * i } # [1, 2, 3]
    my_map ([1, 2, 3]).with_index { |x, i| x * i } # [0, 2, 6]
    

    So your intuition was spot on

    map seems to carry the information that a function has to be applied, on top of carrying the data to iterate over. How does that work?

    That's exactly what it does. map creates an Enumerator whose block knows to call map at the end, whereas each does the same but with each. Of course, in reality, all of this is implemented in C for efficiency and bootstrapping reasons, but the fundamental idea is still there.