I have n ordered lists of unequal size (where I do not know beforehand how many lists there are going to be). I need to find the minimum average distance between one element in each list.
For example given n=3 for three lists:
a = [14, 22, 36, 48]
b = [14, 23, 30, 72]
c = [1, 18, 24]
The output should be (22,23,24) because:
mean(abs(22-23), abs(23-24), abs(22-24)) = 1.33333
which is the smallest among all the points in the example above.
I tried to implement it in Python as following
def aligner(aoa):
'''
read arrays of arrays of peaks and return closest peaks
'''
#one of arrays is empty
if not [y for x in aoa for y in x]:
return None
# there is the same nr in all array no need to do anything
candidate = set.intersection(*map(set, aoa))
if candidate:
# returns intersect
return [max(list(candidate))] * len(aoa)
else:
#tried cartesian product via bumpy malloc err
pass
My doubt is now regarding the implementation of the other part. I thought about using the cartesian product for generating all combination but is going into memory issues. My guesses would be do generate all the combination somehow (maybe itertools??) and loop through all of those but I don't know if there s any algorithm that tackles this problem that I could use.
I do not need code but only hints whether there is any efficient way for solving this or the brute force with n for loops on the permutated lists is the only one
EDIT
Regarding the size of the problem the nr of list is maximum 100 (fixed) while the nr of elements can be vary but I would say that the presented example with 4 or 5 points per list is a realistic scenario.
All points are non-negative.
Tried the proposed itertools solution but of course not memory issues but has been running since hours and it's stuck on the 3rd element.
This method is a brute force method, but uses a method of elimination similar to Dijkstra's algorithm, that results in far fewer cases (making an algorithm that is most likely orders of magnitude faster, especially for big lists or lots of lists). Tell me if you don't understand it, and I can clarify. The implementation can be found here: https://github.com/nerryoob/closestPoint
What you are doing is making a list of the different combination of numbers (i.e. answers)? Best at the start (index 0), worst at the end, or vice versa, see what works best. You'll make the result list for the first input list only, completely ignoring the others. For one list, of course, all the items are the solution - they all have total difference 0. So just copy the first input list into the result list
Next, probably with a while loop, follow this algorithm. Take the top item and pop it from the result list. Store its value. Go to the next input list and for each item in that next input list, make a copy of the top item you just popped that also has the item of the next input list. Find the new overall difference and insert the new item based off of that into the list. Repeat until the top solution has all of the lists in. This means that you guarantee you have the best solution (of at least joint 1st), whilst spending far less time on the combinations that are obviously not the solution
Example (the number in brackets is the total difference)
[14, 22, 36, 48] [14, 23, 30, 72] [1, 18, 24]
Result list is [14(0), 22(0), 36(0), 48(0)]
Keep on repeating and you end up with 22, 23, 24 on top. Because this has all n lists in it, you can therefore stop and give back the answer
To optimise it:
EDIT: Algorithmic complexity is O(n^2)