algorithmgraph-theorygraph-algorithm

Algorithm to simplify a weighted directed graph of debts


I've been using a little python script I wrote to manage debt amongst my roommates. It works, but there are some missing features, one of which is simplifying unnecessarily complicated debt structures. For example, if the following weighted directed graph represents some people and the arrows represent debts between them (Alice owes Bob $20 and Charlie $5, Bob owes Charlie $10, etc.):

graph1

It is clear that this graph should be simplified to the following graph:

graph1-simplified

There's no sense in $10 making its way from Alice to Bob and then from Bob to Charlie if Alice could just give it to Charlie directly.

The goal, then, in the general case is to take a debt graph and simplify it (i.e. produce a new graph with the same nodes but different edges) such that

  1. No node has edges pointing both in and out of it (no useless money changing hands)
  2. All nodes have the same "flow" through them as they did in the original graph (it is identical in terms of where the money ends up).

By "flow", I mean the value of all inputs minus all outputs (is there a technical term for this? I am no graph theory expert). So in the example above, the flow values for each node are:

You can see that the first and second graphs have the same flow through each node, so this is a good solution. There are some other easy cases, for example, any cycle can be simplified by removing the lowest valued edge and subtracting its value from all other edges.

This:

graph2

should be simplified into this:

graph2-simplified

I can't imagine that no one has studied this problem; I just don't know what terms to search for to find info on it (again, not a graph theory expert). I've been looking for several hours to no avail, so my question is this: what is an algorithm that will produce a simplification (new graph) according to the conditions specified above for any weighted directed graph?


Solution

  • Simple algorithm

    You can find in O(n) how much money who is expecting to get or pay. So you could simply create two lists, one for debit and the other for credit, and then balance the head of the two lists until they are empty. From your first example:

    The transactions define the edges of your graph. For n persons involved, there will be at most n-1 transactions=edges. In the beginning, the total length of both lists is n. In each step, at least one of the lists (debit/credit) gets shorter by one, and in the last both lists disappear at once.

    The issue is that, in general, this graph doesn't have to be similar to the original graph, which, as I get your intention, is a requirement. (Is it? There are cases where the optimal solution consists of adding new edges. Imagine A owing B and B owing C the same amount of money, A should pay C directly but this edge is not in the graph of debts.)

    Less transactions

    If the goal is just to construct an equivalent graph, you could search the creditor and debitor lists (as in the section above) for exact matches, or for cases where the sum of credit matches the debit of one person (or the other way round). Look for bin packing. For other cases you will have no other choice than splitting the flows, but even the simple algorithm above produces a graph which has one fewer edge than there are persons involved -- at most.

    EDIT: Thanks to j_random_hacker for pointing out that a solution with less than n-1 edges is possible iff there is a group of persons whose total debts matches the credit of another group of persons: Then, the problem can be split into two subproblems with a total cost of n-2 edges for the transaction graph. Unfortunately, the subset sum problem is NP-hard.

    A flow problem?

    Perhaps this also can be transformed to a min-cost flow problem. If you want just to simplify your original graph, you construct a flow on it, the edge capacities are the original amounts of debit/credit. The debitors serve as inflow nodes (through a connector node which serves all debitors with edges of capacity that equals their total debt), the creditors are used as outflow nodes (with a similar connector node).

    If you want to minimize the number of transactions, you will prefer keeping the "big" transactions and reducing the "small" ones. Hence, the cost of each edge could be modeled as the inverse of the flow on that edge.