I'm working with multiple ordered lists, each containing tuples of the form (key, value)
, where each list is sorted in ascending order by key
. I want to merge these lists into a single ordered list that includes every distinct key from the input lists. The merged output should be a list of tuples in the form:
(key, value_from_list1, value_from_list2, ..., value_from_listK)
with the following behavior:
Here is a complete and minimal example: Given two lists:
A = [(1, a1), (2, a2), (10, a3)]
B = [(1, b1), (8, b2), (10, b3)]
the desired output is:
C = [
(1, a1, b1), // At key 1, both lists provide values.
(2, a2, b1), // At key 2, list A updates to a2; list B still uses b1.
(8, a2, b2), // At key 8, list B updates to b2; list A remains at a2.
(10, a3, b3) // At key 10, both lists update.
]
This merging behavior should extend to more than two lists as well (e.g., merging 5 lists into tuples with 6 elements, where the first element is the key).
All input lists are assumed to contain the element (0, null)
, but we omit this element (i.e. (0, null, null, ..., null)
) from the final output. This ensures that, when merging, if a list doesn't have an entry for a given key, it carries forward the most recent value (initially null
). For example, merging A = [(1, a1)]
and B = [(2, b1)]
yields [(1, a1, null), (2, a1, b1)]
.
My Questions:
My current solution is using a sweeping line algorithm with multiple pointers—one for each list—to keep track of the most recent value as I iterate through the union of keys. If I am not mistaken, this runs in O(N log(N))
I´ve created pseudocode based on the suggestion from user Smylic made in this post. As he says:
If you need to output k values for every key, the most efficient algorithm takes Θ(k⋅u) of time, where u is the number of unique keys.
function mergeLists(lists):
k = number of lists
pointers = array of size k, all initialized to 0
lastValues = array of size k, initially set to None (or some default)
output = empty list
while true:
minKey = None
// Find the minimum key among the current pointers in all lists.
for i from 0 to k-1:
if pointers[i] < length(lists[i]):
currentKey = lists[i][pointers[i]].key
if minKey is None or currentKey < minKey:
minKey = currentKey
// If no minimum key was found, all lists are exhausted.
if minKey is None:
break
// Update lastValues for each list that has the current minKey
for i from 0 to k-1:
if pointers[i] < length(lists[i]) and lists[i][pointers[i]].key == minKey:
lastValues[i] = lists[i][pointers[i]].value
pointers[i] = pointers[i] + 1
// Append the merged tuple (minKey, lastValues[0], lastValues[1], ..., lastValues[k-1])
output.append( (minKey, lastValues[0], lastValues[1], ..., lastValues[k-1]) )
return output
If there are n
distinct keys and m
lists, the total output will be O(n*m)
. Therefore you cannot be faster than that. Your current algorithm is already that speed, so the most that you can improve is a constant factor.
It is possible to find an improvement when most keys do not appear in most lists. But it will add an extra log(m)
factor if most keys appear in most lists.
So your algorithm seems good enough to me.