I have a bunch of lists I need to combine. Each of these lists has an ordering, and each ordering is consistent with the ordering of an original list. (By consistent, I mean that each list can be reproduced by dropping items from the original list, without rearranging any items.)
My problem is that I don't have the original list, and I have to reproduce it as closely as possible using the partial lists I do have.
For example, consider the following ordered lists:
a = ["first", "fourth", "fifth", "sixth", "eighth", "sophomore", "junior"]
b = ["second", "third", "fourth", "sixth", "seventh", "eighth", "freshman", "sophomore", "senior"]
c = ["first", "second", "freshman", "sophomore", "junior", "senior"]
...
partial_lists = [a, b, c, ...]
From these three lists, it's possible to recover the original list. In some cases, however, that may not be possible. Either way, I want to create a list merged_list
such that merged_list
preserves the order of each of the partial lists. (That is, any specified partial list could theoretically be reconstructed from merged_list
using only merged_list.remove()
operations.) It's safe to assume that each partial list contains no duplicates, and merged_list
must not contain any duplicates either.
For this example, merged_list
would be ["first", "second", "third", "fourth", "fifth", "sixth", "seventh", "eighth", "freshman", "sophomore", "junior", "senior"]
Is there an efficient algorithm that can handle this for an arbitrary number of partial lists?
This is about topological sorting, given some dependencies.
There are several ways to find a topological sorted original list. It will help to build a graph data structure that has all the dependencies keyed by nodes. Then yield the nodes that don't have any predecessor, and remove them from the graph. This will give a new set of nodes that don't have any predecessor, ...etc.
Here is an implementation in Python:
from collections import defaultdict
def topological_sort(lists):
# create graph (pred + succ)
succ = defaultdict(set)
pred = defaultdict(set)
nodes = set()
for lst in lists:
nodes.update(lst)
for a, b in zip(lst, lst[1:]):
succ[a].add(b)
pred[b].add(a)
# get nodes that don't have a predecessor
first = { node for node in nodes if not pred[node] }
while first:
collect = list(first)
first.clear()
yield from collect # yield the nodes that don't have a predecessor (anymore)
for a in collect:
for b in succ[a]:
pred[b].remove(a) # those can't be a predecessor anymore
if not pred[b]: # this may make another node to be without predecessor
first.add(b)
# demo run:
a = ["first", "fourth", "fifth", "sixth", "eighth", "sophomore", "junior"]
b = ["second", "third", "fourth", "sixth", "seventh", "eighth", "freshman", "sophomore", "senior"]
c = ["first", "second", "freshman", "sophomore", "junior", "senior"]
partial_lists = [a, b, c]
result = list(topological_sort(partial_lists))
For this example run, the result
will come out as:
['first', 'second', 'third', 'fourth', 'fifth', 'sixth', 'seventh', 'eighth', 'freshman', 'sophomore', 'junior', 'senior']