I want to detect all the files a Ruby file directly references for documentation purposes. Reading the basic requires list is not complete because there are some files that are imported transitively and others that are imported but never used. For example:
a.rb:
require 'b'
require 'e'
class A; end
B.new; C.new
b.rb:
require 'c'
require 'd'
class B; end
C.new; D.new
c.rb:
class C; end
(d.rb and e.rb are just like c.rb)
Then the list I want to get for a.rb
is b.rb, c.rb
. No D or E because they are not directly referenced. Hope this makes sense!
So there's some fuzziness here regarding what 'used' means. Clearly d is used since b.rb (which is also used) calls D.new
at the end. If we caveat 'used' to mean "code was executed from that file, other than during the require process" then the following code is a close as I can get on ruby 1.9.3
require 'set'
def analyze(filename)
require_depth = 0
files = Set.new
set_trace_func( lambda do |event, file, line, id, binding, classname|
case event
when 'call'then require_depth += 1 if id == :require && classname == Kernel
when 'return' then require_depth -= 1 if id == :require && classname == Kernel
when 'line'
files << file if require_depth == 0
end
end)
load filename
set_trace_func nil
files.reject {|f| f == __FILE__ || f =~ %r{/lib/ruby/site_ruby}}
end
You'd use it by running analyse 'a.rb'
(assuming that all the files involved are on the load path). What this does is uses ruby's set_trace_func to listen to what's going on. The first part is a crude attempt to ignore everything that happens during a call to require. Then we accumulate the filename of every line of executed ruby. The last line is just clearing up junk (eg the rubygems file that patches require).
This doesn't actually work for the test example: when B.new runs, no lines of code from b.rb are actually executed. However if B (and C, D etc.) have initialize methods (or some line of code that is called) then you should get the desired result. It's pretty simplistic stuff and could be fooled by all sorts of stuff. In particular if you call a method on (say) B, but the implementation of that method isn't in b.rb (e.g. an accessor defined with attr_accessor) then b.rb isn't logged
You might be able to use the call event better but I don't think much more can be done with set_trace_func.
If you are using ruby 2.0 then you can use TracePoint which is the replacement for set_trace_func
. It has slightly different semantics, in particular when we track a method call it's easier to get the class it was called on so
require 'set'
def analyze(filename)
require_depth = 0
files = Set.new
classes_to_files = {}
trace = TracePoint.new(:call, :line, :return, :c_call, :class) do |tp|
case tp.event
when :class
classes_to_files[tp.self] = tp.path
when :call, :c_call then
if tp.method_id == :require && tp.defined_class == Kernel
require_depth += 1
else
if require_depth == 0
if path = classes_to_files[tp.self] || classes_to_files[tp.self.class]
files << path
end
end
end
when :return then require_depth -= 1 if tp.method_id == :require && tp.defined_class == Kernel
when :line
if require_depth == 0
files << tp.path
end
end
end
trace.enable
load filename
trace.disable
files.reject {|f| f == __FILE__ || f =~ %r{/lib/ruby/site_ruby}}
end
does return a,b,c for the test example. It's still subject to the fundamental limitation that it only knows about code that actually gets executed.