rubyimportrequires

How to find all referenced files in Ruby


I want to detect all the files a Ruby file directly references for documentation purposes. Reading the basic requires list is not complete because there are some files that are imported transitively and others that are imported but never used. For example:

a.rb:

require 'b'
require 'e'
class A; end
B.new; C.new

b.rb:

require 'c'
require 'd'
class B; end
C.new; D.new

c.rb:
class C; end

(d.rb and e.rb are just like c.rb)

Then the list I want to get for a.rb is b.rb, c.rb. No D or E because they are not directly referenced. Hope this makes sense!


Solution

  • So there's some fuzziness here regarding what 'used' means. Clearly d is used since b.rb (which is also used) calls D.new at the end. If we caveat 'used' to mean "code was executed from that file, other than during the require process" then the following code is a close as I can get on ruby 1.9.3

    require 'set'
    def analyze(filename)
      require_depth = 0
      files = Set.new
      set_trace_func( lambda do |event, file, line, id, binding, classname|
        case event
        when 'call'then require_depth += 1 if id == :require && classname == Kernel
        when 'return' then require_depth -= 1 if id == :require && classname == Kernel
        when 'line' 
          files << file if require_depth == 0
        end
      end)
      load filename
      set_trace_func nil
      files.reject {|f| f == __FILE__ || f =~ %r{/lib/ruby/site_ruby}}
    end
    

    You'd use it by running analyse 'a.rb' (assuming that all the files involved are on the load path). What this does is uses ruby's set_trace_func to listen to what's going on. The first part is a crude attempt to ignore everything that happens during a call to require. Then we accumulate the filename of every line of executed ruby. The last line is just clearing up junk (eg the rubygems file that patches require).

    This doesn't actually work for the test example: when B.new runs, no lines of code from b.rb are actually executed. However if B (and C, D etc.) have initialize methods (or some line of code that is called) then you should get the desired result. It's pretty simplistic stuff and could be fooled by all sorts of stuff. In particular if you call a method on (say) B, but the implementation of that method isn't in b.rb (e.g. an accessor defined with attr_accessor) then b.rb isn't logged

    You might be able to use the call event better but I don't think much more can be done with set_trace_func.

    If you are using ruby 2.0 then you can use TracePoint which is the replacement for set_trace_func. It has slightly different semantics, in particular when we track a method call it's easier to get the class it was called on so

    require 'set'
    def analyze(filename)
      require_depth = 0
      files = Set.new
      classes_to_files = {}
      trace = TracePoint.new(:call, :line, :return, :c_call, :class) do |tp|
        case tp.event
        when :class
          classes_to_files[tp.self] = tp.path
        when :call, :c_call then 
          if tp.method_id == :require && tp.defined_class == Kernel
            require_depth += 1
          else
            if require_depth == 0
              if path = classes_to_files[tp.self] || classes_to_files[tp.self.class]
                files << path
              end
            end
          end
        when :return then require_depth -= 1 if tp.method_id == :require && tp.defined_class == Kernel
        when :line 
          if require_depth == 0
            files << tp.path 
          end
        end
      end
    
      trace.enable
      load filename
      trace.disable
      files.reject {|f| f == __FILE__ || f =~ %r{/lib/ruby/site_ruby}}
    end
    

    does return a,b,c for the test example. It's still subject to the fundamental limitation that it only knows about code that actually gets executed.