c++algorithm bit-manipulation graph-theory directed-acyclic-graphs

Summation of nodes reachable starting from every node present in given DAG with restriction on number of childs per node

That's a problem from XXX Polish Olympiad in Informatics, stage II, titled "Wspinaczka". Link to original problem statement in Polish.

Let's compress above story into algorithmic problem statement:

Given Directed Acyclic Graph (DAG) g, for every vertex p in this graph g calculate the sum of all reachable nodes if we start traversing from p. The values of nodes are given as array, where i-th value corresponds to i-th vertex.

It's known that when we are at vertex i, then directed edge from node i to node j exists only if j > i.

Also, we are given k, 1 <= k <= 8 and following condition must hold: j - i <= k

There are n nodes, 1 <= n <= 100 000 and m edges 1 <= m <= 800 000

That's how I started:

#include <iostream>
#include <vector>
#include <queue>
#include <algorithm>

using namespace std;

int main()
{
    ios_base::sync_with_stdio(false);
    cin.tie(nullptr);
    cout.tie(nullptr);

    int n, m, k;
    cin >> n >> m >> k;

    // values of nodes
    vector<int> beauty(n);
    for (int i = 0; i < n; ++i)
        cin >> beauty[i];

    vector<vector<int>> adj_list(n);
    vector<vector<int>> reversed_adj_list(n);
    vector<int> outgoing_edges(n);
    for (int i = 0; i < m; ++i)
    {
        int a, b;
        cin >> a >> b;

        --a;
        --b;

        adj_list[a].push_back(b);
        reversed_adj_list[b].push_back(a);
        ++outgoing_edges[a];
    }

    queue<int> q;
    for (int i = 0; i < n; ++i)
    {
        if (outgoing_edges[i] == 0)
        {
            q.push(i);
        }
    }

    vector<long long> dp(n);
    while (!q.empty())
    {
        int node = q.front();
        q.pop();

        dp[node] = beauty[node];
        for (const int child : adj_list[node])
        {
            dp[node] += dp[child];
        }

        for (const int parent : reversed_adj_list[node])
        {
            --outgoing_edges[parent];

            if (outgoing_edges[parent] == 0)
            {
                q.push(parent);
            }
        }
    }

    for (int i = 0; i < n; ++i)
        cout << dp[i] << '\n';
}

So I perform topological sorting (firstly I process n node, then n-1, then n-2 node and so on because we know that there CAN BE edge between node i and j only if j > i).

While this topological sort, I'm calculating dp[i] like this: dp[i] = value[i] + sum_of_values_of_all_childs. The problem is that this overcounts. Why?

See graph below (red numbers are values):

For node 1, we are summing up values of all childs and their subgraphs, but both of these childs already have counted node 4. So instead of result 13 when I start at node 1, calculated answer is 14 because I counted 4 node twice.

I have no idea how to avoid this situation, and I didn't make use of small value of k. I know there exists solution which uses dp[node][mask], where mask is up to 2^K, but I'm unable to find it out. The time complexity of this solution is O(n * 2^K + m) (m because of reading input).

Solution

Let's process/build the DAG from vertex N to vertex 1. At each point when processing a vertex i, let's store dp[mask]: the sum of values of vertices which can be reached exactly from the masked subset of vertices {i+1...i+k}. Summing over reachable masks from i gives answers to the problem.

We just need to recalculate the DP after adding vertex i, i.e. calculate dp_new[mask_new] for masks of {i...i+k-1}. Note that each vertex is only counted once in the DP, so we just need to calculate the uniquely given new mask for every dp[mask] and add that sum of weights to updated DP array. For every mask, we know that the first k-1 bits of mask are the last k-1 bits of mask_new, so that's just a shift; all k bits tell us if we're dealing with vertices reachable from i as well (they determine the first bit of new mask), and k-th bit isn't used for anything else. Finally we add the weight of vertex i to dp_new[100...0].

With O(1) bitwise operations used for all calculations with masks, this has the desired time complexity.