There is a lot of talk on decoupling the algorithms from the classes. But, one thing stays aside not explained.
They use visitor like this
abstract class Expr {
public <T> T accept(Visitor<T> visitor) { return visitor.visit(this); }
}
class ExprVisitor extends Visitor {
public Integer visit(Num num) {
return num.value;
}
public Integer visit(Sum sum) {
return sum.getLeft().accept(this) + sum.getRight().accept(this);
}
public Integer visit(Prod prod) {
return prod.getLeft().accept(this) * prod.getRight().accept(this);
}
}
Instead of calling visit(element) directly, Visitor asks the element to call its visit method. It contradicts the declared idea of class unawareness about visitors.
PS1 Please explain with your own words or point to exact explanation. Because two responses I got refer to something general and uncertain.
PS2 My guess: Since getLeft()
returns the basic Expression
, calling visit(getLeft())
would result in visit(Expression)
, whereas getLeft()
calling visit(this)
will result in another, more appropriate, visit invocation. So, accept()
performs the type conversion (aka casting).
PS3 Scala's Pattern Matching = Visitor Pattern on Steroid shows how much simpler the Visitor pattern is without the accept method. Wikipedia adds to this statement: by linking a paper showing "that accept()
methods are unnecessary when reflection is available; introduces term 'Walkabout' for the technique."
The visitor pattern's visit
/accept
constructs are a necessary evil due to C-like languages' (C#, Java, etc.) semantics. The goal of the visitor pattern is to use double-dispatch to route your call as you'd expect from reading the code.
Normally when the visitor pattern is used, an object hierarchy is involved where all the nodes are derived from a base Node
type, referred to henceforth as Node
. Instinctively, we'd write it like this:
Node root = GetTreeRoot();
new MyVisitor().visit(root);
Herein lies the problem. If our MyVisitor
class was defined like the following:
class MyVisitor implements IVisitor {
void visit(CarNode node);
void visit(TrainNode node);
void visit(PlaneNode node);
void visit(Node node);
}
If, at runtime, regardless of the actual type that root
is, our call would go into the overload visit(Node node)
. This would be true for all variables declared of type Node
. Why is this? Because Java and other C-like languages only consider the static type, or the type that the variable is declared as, of the parameter when deciding which overload to call. Java doesn't take the extra step to ask, for every method call, at runtime, "Okay, what is the dynamic type of root
? Oh, I see. It's a TrainNode
. Let's see if there's any method in MyVisitor
which accepts a parameter of type TrainNode
...". The compiler, at compile-time, determines which is the method that will be called. (If Java indeed did inspect the arguments' dynamic types, performance would be pretty terrible.)
Java does give us one tool for taking into account the runtime (i.e. dynamic) type of an object when a method is called -- virtual method dispatch. When we call a virtual method, the call actually goes to a table in memory that consists of function pointers. Each type has a table. If a particular method is overridden by a class, that class' function table entry will contain the address of the overridden function. If the class doesn't override a method, it will contain a pointer to the base class' implementation. This still incurs a performance overhead (each method call will basically be dereferencing two pointers: one pointing to the type's function table, and another of function itself), but it's still faster than having to inspect parameter types.
The goal of the visitor pattern is to accomplish double-dispatch -- not only is the type of the call target considered (MyVisitor
, via virtual methods), but also the type of the parameter (what type of Node
are we looking at)? The Visitor pattern allows us to do this by the visit
/accept
combination.
By changing our line to this:
root.accept(new MyVisitor());
We can get what we want: via virtual method dispatch, we enter the correct accept() call as implemented by the subclass -- in our example with TrainElement
, we'll enter TrainElement
's implementation of accept()
:
class TrainNode extends Node implements IVisitable {
void accept(IVisitor v) {
v.visit(this);
}
}
What does the compiler know at this point, inside the scope of TrainNode
's accept
? It knows that the static type of this
is a TrainNode
. This is an important additional shred of information that the compiler was not aware of in our caller's scope: there, all it knew about root
was that it was a Node
. Now the compiler knows that this
(root
) is not just a Node
, but it's actually a TrainNode
. In consequence, the one line found inside accept()
: v.visit(this)
, means something else entirely. The compiler will now look for an overload of visit()
that takes a TrainNode
. If it can't find one, it'll then compile the call to an overload that takes a Node
. If neither exist, you'll get a compilation error (unless you have an overload that takes object
). Execution will thus enter what we had intended all along: MyVisitor
's implementation of visit(TrainNode e)
. No casts were needed, and, most importantly, no reflection was needed. Thus, the overhead of this mechanism is rather low: it only consists of pointer references and nothing else.
You're right in your question -- we can use a cast and get the correct behavior. However, often, we don't even know what type Node is. Take the case of the following hierarchy:
abstract class Node { ... }
abstract class BinaryNode extends Node { Node left, right; }
abstract class AdditionNode extends BinaryNode { }
abstract class MultiplicationNode extends BinaryNode { }
abstract class LiteralNode { int value; }
And we were writing a simple compiler which parses a source file and produces a object hierarchy that conforms to the specification above. If we were writing an interpreter for the hierarchy implemented as a Visitor:
class Interpreter implements IVisitor<int> {
int visit(AdditionNode n) {
int left = n.left.accept(this);
int right = n.right.accept(this);
return left + right;
}
int visit(MultiplicationNode n) {
int left = n.left.accept(this);
int right = n.right.accept(this);
return left * right;
}
int visit(LiteralNode n) {
return n.value;
}
}
Casting wouldn't get us very far, since we don't know the types of left
or right
in the visit()
methods. Our parser would most likely also just return an object of type Node
which pointed at the root of the hierarchy as well, so we can't cast that safely either. So our simple interpreter can look like:
Node program = parse(args[0]);
int result = program.accept(new Interpreter());
System.out.println("Output: " + result);
The visitor pattern allows us to do something very powerful: given an object hierarchy, it allows us to create modular operations that operate over the hierarchy without needing requiring to put the code in the hierarchy's class itself. The visitor pattern is used widely, for example, in compiler construction. Given the syntax tree of a particular program, many visitors are written that operate on that tree: type checking, optimizations, machine code emission are all usually implemented as different visitors. In the case of the optimization visitor, it can even output a new syntax tree given the input tree.
It has its drawbacks, of course: if we add a new type into the hierarchy, we need to also add a visit()
method for that new type into the IVisitor
interface, and create stub (or full) implementations in all of our visitors. We also need to add the accept()
method too, for the reasons described above. If performance doesn't mean that much to you, there are solutions for writing visitors without needing the accept()
, but they normally involve reflection and thus can incur quite a large overhead.