rubyfiber

control flow in ruby fiber program


I know that fibers are cooperative threads. A fiber has control of the execution context, whereas a preemptive thread does not. A fiber can yield control, which means a fiber can start and stop in well-defined places.

Apparently, the reason why fibers are used in evented ruby is to clean up the nested blocks caused by the reactor pattern.

But I have difficulty trying to grasp the control flow of the below script that uses fiber.

def http_get(url)
  f = Fiber.current
  http = EventMachine::HttpRequest.new(url).get

  # resume fiber once http call is done
  http.callback { f.resume(http) }
  http.errback  { f.resume(http) }

  return Fiber.yield
end

EventMachine.run do
  Fiber.new{
    page = http_get('http://www.google.com/')
    puts "Fetched page: #{page.response_header.status}"

    if page
      page = http_get('http://www.google.com/search?q=eventmachine')
      puts "Fetched page 2: #{page.response_header.status}"
    end
  }.resume
end

The way I understand it:

1) EM starts its event loop

2) A fiber is created and then the resume is called. Does the block of code passed to new get executed right away or does it get executed after resume is invoked?

3) http_get is called the first time. It does an asynchronous event (using select, poll or epoll on linux). We set up the event handler (in the callback method) of the asynchronous event. Then Fiber voluntarily yields control to the thread EventMachine is on (the main thread). However, as soon as the callback is invoked, it will take control back with f.resume(http). But in this simplified example, am I supposed to put my own callback code after f.resume(http)? Because right now it seems like f.resume(http) just returns control to the fiber and does nothing else.

I think what happens after yield is the control goes to EventMachine where it enters its event loop. So the second http_get is not invoked yet. Now once the callback is invoked, then control is returned to the Fiber (we only use one Fiber.new so I assume there is only one Fiber instance in all this). But when does the second http_get get called?


Solution

  • Let me see if I can answer it for you. I am adding line numbers to aid the description:

    01: def http_get(url)
    02:   f = Fiber.current
    03:   http = EventMachine::HttpRequest.new(url).get
    04: 
    05:   # resume fiber once http call is done
    06:   http.callback { f.resume(http) }
    07:   http.errback  { f.resume(http) }
    08: 
    09:   return Fiber.yield
    10: end
    11: 
    12: EventMachine.run do
    13:   Fiber.new{
    14:     page = http_get('http://www.google.com/')
    15:     puts "Fetched page: #{page.response_header.status}"
    16: 
    17:     if page
    18:       page = http_get('http://www.google.com/search?q=eventmachine')
    19:       puts "Fetched page 2: #{page.response_header.status}"
    20:     end
    21:   }.resume
    22: end
    
    1. Line 21 starts the execution of Fiber whose code is in Lines 14-20
    2. Fiber code seems to be doing the following: Line 14 checks whether we can do GET on google.come. In Line 17 it checks if there was valid response from http_get, then, perform next request in Line 18 to search a string eventmachine.
    3. When the Fiber execution starts due to .resume at Line 21, Line 14 gets executed which invokes http_get method.
    4. Line 02 to 07 sets up the async HTTP GET request and callbacks.
    5. Line 09 yields control back to EventMachine.
    6. After some time when async HTTP GET call from line 03 completes execution asynchronously and results in one of the callbacks on Line 06 or 07, the original Fiber created on Line 13 to Line 20 gets back the control.
    7. Now the Fiber execution resumes from Line 15. The callback from Line 06/07 had passed reference to http object, which is now referenced with variable page in Line 14 and subsequently used in Line 15 to dump the HTTP request status.
    8. As Fiber continues execution further, it checks if page is truthy value, then, goes ahead and calls http_get again, but with a new URL. Note that code if page may never execute in case it was nil as Line 15 would have bombed where page accessed without a check for nil.
    9. Similar process repeats - Line 02 to 07 sets up HTTP GET call and Line 09 yields control back to EventMachine.
    10. After some time, one of the callbacks is invoked and Line 19 gets executed as Fiber regains control.
    11. After execution of Line 19, the Fiber will become dead.

    Hope that clarifies the matter.

    As far as handling the response of HTTP GET with additional logic, I guess you can replace the puts with some meaningful processing logic. The puts in the this sample seems to be dealing with responses and callbacks are used primarily to resume the Fiber.