algorithmscalalongest-path

Finding the longest path in the Sling Blade Runner puzzle


I've been trying to solve an archived ITA Software puzzle known as Sling Blade Runner for a few days now. The gist of the puzzle is as follows:

"How long a chain of overlapping movie titles, like Sling Blade Runner, can you find?"

Use the following listing of movie titles: MOVIES.TXT. Multi-word overlaps, as in "License to Kill a Mockingbird," are allowed. The same title may not be used more than once in a solution. Heuristic solutions that may not always produce the greatest number of titles will be accepted: seek a reasonable tradeoff of efficiency and optimality.

The file MOVIES.TXT contains 6561 movie titles in alphabetical order.

My attempt at a solution has several parts.

Graph Construction:

What I did was map every movie title to every other movie title it could chain to (on it's right). What I end up with as my graph is a Map[String, List[String]]. You can see the graph that was built using this process here.

Graph Traversal:

I did a Depth First Search using every node (every key in the map) as a starting node of the search. I kept track of the depth at which each node was visited during the search, and this was tagged in the nodes returned by the DFS. What I ended up with was a List[List[Node]] where every List[Node] in the List was the DFS tree from a particular search.

Finding the longest chain:

I took the results of all the graph traversals in the previous step, and for every List[Node] I sorted the list by the depth values I tagged the nodes with previously, in descending order. Then starting with the head of the list (which gives me the deepest node visited in the DFS) I backtrack through the nodes to build a chain. This gave me a List[List[String]] where every List[String] in the List was the longest chain for that particular DFS. Sorting the List[List[String]] by the size of each List[String] and grabbing the head then gave me the largest chain.

Results:

The longest chain found with my algorithm was 217 titles long. The output can be viewed here.

I've only been able to find a few other attempts by Googling, and it seems every other attempt has produced longer chains than what I was able to accomplish. For example this post states that Eric Burke found a chain 245 titles long, and a user by the name of icefox on Reddit found a chain that was 312 titles long.

I can't think of where my algorithm is failing to find the longest chain, given other people have found longer chains. Any help/guidance is much appreciated. If you'd like to review my code, it can be found here (it's written in Scala and I just started learning Scala so forgive me if I made some noob mistakes).

Update:

I made some changes to my algorithm, which now finds chains of length 240+. See here


Solution

  • The issue is that, since the movie graph (I'm assuming) has cycles, no matter how you assign depths to the vertices of the cycle, there exists a subpath that is not monotone in the depth and thus is not considered by your algorithm. Sling Blade Runner is NP-hard, since we want no, so no known polynomial-time strategy is going to produce optimal solutions on every input.

    (Sling Blade Runner isn't quite the NP-hard longest path problem, which specifies paths with no repeated vertices instead of no repeated arcs, but there is an easy polynomial-time reduction from the latter to the former. Split each vertex v into v_in -> v_out, moving arc heads to the in vertex and arc tails to the out vertex. Make additional arcs from a source vertex to another source vertex to each in vertex, and from each out vertex to a sink vertex to another sink vertex.

    To find the longest path on the graph a->b, b->c, c->a, c->d, the input to Sling Blade Runner would be

    s1->s2,
    s2->a_in, s2->b_in, s2->c_in, s2->d_in,
    a_in->a_out, b_in->b_out, c_in->c_out, d_in->d_out,
    a_out->b_in, b_out->c_in, c_out->a_in, c_out->d_in,
    a_out->t1, b_out->t1, c_out->t1, d_out->t1,
    t1->t2.
    

    The longest path problem forbids repeated vertices, so the optimal solution is a->b->c->d rather than c->a->b->c->d. The corresponding chain in Sling Blade Runner is s1->s2->a_in->a_out->b_in->b_out->c_in->c_out->d_in->d_out->t1->t2. The corresponding transformation of the path with a repeated vertex would repeat the arc c_in->c_out and thus be infeasible for Sling Blade Runner.)

    Suppose that the movies titles are

    S A
    S B
    A B
    A E
    B C
    C D
    D A
    E F
    

    , so that the graph looks like

        F
        ^
        |
        E
        ^
        |
    S-->A-->B<--
    |   ^   |   \
    |   |   v   |
    |   D<--C   |
    \___________/
    

    We start the DFS from S and get the following tree (because I said so; this is not the only possible DFS tree).

    S-->A-->B-->C-->D
         \
          ->E-->F
    

    . The depths are

    S 0
    A 1
    B 2
    C 3
    D 4
    E 2
    F 3
    

    , so the longest depth-monotone path is S A B C D. The longest path is S B C D A E F. If you start the DFS elsewhere, then you won't even assign S a depth.

    A simpler example is

    A B
    B C
    C D
    D A
    

    , where, no matter where you start, the optimal path, that goes all the way around the cycle, is not depth-monotone: A B C D A or B C D A B or C D A B C or D A B C D.