algorithmtreetime-complexitylowest-common-ancestor

Palindromes in a tree


I am looking at this challenge:

Given a tree with N nodes and N-1 edges. Each edge on the tree is labelled by a string of lowercase letters from the Latin alphabet. Given Q queries, consisting of two nodes u and v, check if it is possible to make a palindrome string which uses all the characters that belong to the string labelled on the edges in the path from node u to node v.

Characters can be used in any order.

N is of the order of 105 and Q is of the order of 106

Input:

N=3
u=1 v=3 weight=bc
u=1 v=2 weight=aba
Q=4
u=1 v=2
u=2 v=3
u=3 v=1
u=3 v=3

Output:

YES
YES
NO
NO

What I thought was to compute the LCA between 2 nodes by precomputation in O(1) using sparse table and Range minimum query on Euler tower and then see the path from LCA to node u and LCA to node v and store all the characters frequency. If the sum of frequency of all the characters is odd, we check if the frequency of each character except one is odd. If the sum of frequency of all the characters is even, we check if the frequency of each character is even. But this process will surely time out because Q can be upto 106.

Is there anyone with a better algorithm?


Solution

  • Preparation Step

    Prepare your data structure as follows:

    For each node get the path to the root, get all letters on the path, and only retain a letter when it occurs an odd number of times on that path. Finally encode that string with unique letters as a bit pattern, where bit 0 is set when there is an "a", bit 1 is set when there is a "b", ... bit 25 is set when there is a "z". Store this pattern with the node.

    This preprocessing can be done with a depth-first recursive procedure, where the current node's pattern is passed down to the children, which can apply the edge's information to that pattern to create their own pattern. So this preprocessing can run in linear time in terms of the total number of characters in the tree, or more precisely O(N+S), where S represents that total number of characters.

    Query Step

    When a query is done perform the bitwise XOR on the two involved patterns. If the result is 0 or it has only one bit set, return "YES", else return "NO". So the query will not visit any other nodes than just the two ones that are given, look up the two patterns and perform their XOR and make the bit test. All this happens in constant time for one query.

    The last query given in the question shows that the result should be "NO" when the two nodes are the same node. This is a boundary case, as it is debatable whether an empty string is a palindrome or not. The above XOR algorithm would return "YES", so you would need a specific test for this boundary case, and return "NO" instead.

    Explanation

    This works because if we look at the paths both nodes have to the root, they may share a part of their path. The characters on that common path should not be considered, and the XOR will make sure they aren't. Where the paths differ, we actually have the edges on the path from the one node to the other. There we see the characters that should contribute to a palindrome.

    If a character appears an even number of times in those edges, it poses no problem for creating a palindrome. The XOR makes sure those characters "disappear".

    If a character appears an odd number of times, all but one can mirror each other like in the even case. The remaining one can only be used in an odd-length palindrome, and only in the centre position of it. So there can only be one such character. This translates to the test that the XOR result is allowed to have 1 bit set (but not more).

    Implementation

    Here is an implementation in JavaScript. The example run uses the input as provided in the question. I did not bother to turn the query results from boolean to NO/YES:

    function prepare(edges) {
        // edges: array of [u, v, weight] triplets
        // Build adjacency list from the list of edges
        let adjacency = {};
        for (let [u, v, weight] of edges) {
            // convert weight to pattern, as we don't really need to 
            //   store the strings
            let pattern = 0;
            for (let i = 0; i < weight.length; i++) {
                let ascii = weight.charCodeAt(i) - 97;
                pattern ^= 1 << ascii; // toggle bit that corresponds to letter
            }
            if (v in adjacency && u in adjacency) throw "Cycle detected!";
            if (!(v in adjacency)) adjacency[v] = {};
            if (!(u in adjacency)) adjacency[u] = {};
            adjacency[u][v] = pattern;
            adjacency[v][u] = pattern;
        }
        
        // Prepare the consolidated path-pattern for each node
        let patterns = {}; // This is the information to return
    
        function dfs(u, parent, pathPattern) {
            patterns[u] = pathPattern;
            for (let v in adjacency[u]) {
                // recurse into the "children" (the parent is not revisited)
                if (v !== parent) dfs(v, u, adjacency[u][v] ^ pathPattern);
            }
        }
        
        // Start a DFS from an arbitrary node as root
        dfs(edges[0][0], null, 0);
        return patterns;
    }
    
    function query(nodePatterns, u, v) {
        if (u === v) return false; // Boundary case.
        let pattern = nodePatterns[u] ^ nodePatterns[v];
        // "smart" test to verify that at most 1 bit is set
        return pattern === (pattern & -pattern); 
    }
    
    // Example:
    let edges = [[1, 3, "bc"], [1, 2, "aba"]];
    let queries = [[1, 2], [2, 3], [3, 1], [3, 3]];
    
    let nodePatterns = prepare(edges);
    for (let [u, v] of queries) {
        console.log(u, v, query(nodePatterns, u, v));
    }