I've been trying to solve an archived ITA Software puzzle known as Sling Blade Runner for a few days now. The gist of the puzzle is as follows:
"How long a chain of overlapping movie titles, like Sling Blade Runner, can you find?"
Use the following listing of movie titles: MOVIES.TXT. Multi-word overlaps, as in "License to Kill a Mockingbird," are allowed. The same title may not be used more than once in a solution. Heuristic solutions that may not always produce the greatest number of titles will be accepted: seek a reasonable tradeoff of efficiency and optimality.
The file MOVIES.TXT contains 6561 movie titles in alphabetical order.
My attempt at a solution has several parts.
What I did was map every movie title to every other movie title it could chain to (on it's right). What I end up with as my graph is a Map[String, List[String]]
. You can see the graph that was built using this process here.
I did a Depth First Search using every node (every key in the map) as a starting node of the search. I kept track of the depth at which each node was visited during the search, and this was tagged in the nodes returned by the DFS. What I ended up with was a List[List[Node]]
where every List[Node]
in the List was the DFS tree from a particular search.
I took the results of all the graph traversals in the previous step, and for every List[Node]
I sorted the list by the depth values I tagged the nodes with previously, in descending order. Then starting with the head of the list (which gives me the deepest node visited in the DFS) I backtrack through the nodes to build a chain. This gave me a List[List[String]]
where every List[String]
in the List was the longest chain for that particular DFS. Sorting the List[List[String]]
by the size of each List[String]
and grabbing the head then gave me the largest chain.
The longest chain found with my algorithm was 217 titles long. The output can be viewed here.
I've only been able to find a few other attempts by Googling, and it seems every other attempt has produced longer chains than what I was able to accomplish. For example this post states that Eric Burke found a chain 245 titles long, and a user by the name of icefox on Reddit found a chain that was 312 titles long.
I can't think of where my algorithm is failing to find the longest chain, given other people have found longer chains. Any help/guidance is much appreciated. If you'd like to review my code, it can be found here (it's written in Scala and I just started learning Scala so forgive me if I made some noob mistakes).
I made some changes to my algorithm, which now finds chains of length 240+. See here
The issue is that, since the movie graph (I'm assuming) has cycles, no matter how you assign depths to the vertices of the cycle, there exists a subpath that is not monotone in the depth and thus is not considered by your algorithm. Sling Blade Runner is NP-hard, since we want no, so no known polynomial-time strategy is going to produce optimal solutions on every input.
(Sling Blade Runner isn't quite the NP-hard longest path problem, which specifies paths with no repeated vertices instead of no repeated arcs, but there is an easy polynomial-time reduction from the latter to the former. Split each vertex v
into v_in -> v_out
, moving arc heads to the in vertex and arc tails to the out vertex. Make additional arcs from a source vertex to another source vertex to each in vertex, and from each out vertex to a sink vertex to another sink vertex.
To find the longest path on the graph a->b, b->c, c->a, c->d
, the input to Sling Blade Runner would be
s1->s2,
s2->a_in, s2->b_in, s2->c_in, s2->d_in,
a_in->a_out, b_in->b_out, c_in->c_out, d_in->d_out,
a_out->b_in, b_out->c_in, c_out->a_in, c_out->d_in,
a_out->t1, b_out->t1, c_out->t1, d_out->t1,
t1->t2.
The longest path problem forbids repeated vertices, so the optimal solution is a->b->c->d
rather than c->a->b->c->d
. The corresponding chain in Sling Blade Runner is s1->s2->a_in->a_out->b_in->b_out->c_in->c_out->d_in->d_out->t1->t2
. The corresponding transformation of the path with a repeated vertex would repeat the arc c_in->c_out
and thus be infeasible for Sling Blade Runner.)
Suppose that the movies titles are
S A
S B
A B
A E
B C
C D
D A
E F
, so that the graph looks like
F
^
|
E
^
|
S-->A-->B<--
| ^ | \
| | v |
| D<--C |
\___________/
We start the DFS from S
and get the following tree (because I said so; this is not the only possible DFS tree).
S-->A-->B-->C-->D
\
->E-->F
. The depths are
S 0
A 1
B 2
C 3
D 4
E 2
F 3
, so the longest depth-monotone path is S A B C D
. The longest path is S B C D A E F
. If you start the DFS elsewhere, then you won't even assign S
a depth.
A simpler example is
A B
B C
C D
D A
, where, no matter where you start, the optimal path, that goes all the way around the cycle, is not depth-monotone: A B C D A
or B C D A B
or C D A B C
or D A B C D
.