python python-3.x algorithm nested-lists closest-points

Algorithm, closest point between list elements

I have n ordered lists of unequal size (where I do not know beforehand how many lists there are going to be). I need to find the minimum average distance between one element in each list.

For example given n=3 for three lists:

a = [14, 22, 36, 48]
b = [14, 23, 30, 72]
c = [1, 18, 24]

The output should be (22,23,24) because:

mean(abs(22-23), abs(23-24), abs(22-24)) = 1.33333

which is the smallest among all the points in the example above.

I tried to implement it in Python as following

def aligner(aoa):
'''
read arrays of arrays of peaks and return closest peaks
'''
#one of arrays is empty
if not [y for x in aoa for y in x]:
    return None
# there is the same nr in all array no need to do anything
candidate = set.intersection(*map(set, aoa))
if candidate:
    # returns intersect
    return [max(list(candidate))] * len(aoa)
else:
    #tried cartesian product via bumpy malloc err
    pass

My doubt is now regarding the implementation of the other part. I thought about using the cartesian product for generating all combination but is going into memory issues. My guesses would be do generate all the combination somehow (maybe itertools??) and loop through all of those but I don't know if there s any algorithm that tackles this problem that I could use.

I do not need code but only hints whether there is any efficient way for solving this or the brute force with n for loops on the permutated lists is the only one

EDIT

Regarding the size of the problem the nr of list is maximum 100 (fixed) while the nr of elements can be vary but I would say that the presented example with 4 or 5 points per list is a realistic scenario.

All points are non-negative.

Tried the proposed itertools solution but of course not memory issues but has been running since hours and it's stuck on the 3rd element.

Solution

This method is a brute force method, but uses a method of elimination similar to Dijkstra's algorithm, that results in far fewer cases (making an algorithm that is most likely orders of magnitude faster, especially for big lists or lots of lists). Tell me if you don't understand it, and I can clarify. The implementation can be found here: https://github.com/nerryoob/closestPoint

What you are doing is making a list of the different combination of numbers (i.e. answers)? Best at the start (index 0), worst at the end, or vice versa, see what works best. You'll make the result list for the first input list only, completely ignoring the others. For one list, of course, all the items are the solution - they all have total difference 0. So just copy the first input list into the result list

Next, probably with a while loop, follow this algorithm. Take the top item and pop it from the result list. Store its value. Go to the next input list and for each item in that next input list, make a copy of the top item you just popped that also has the item of the next input list. Find the new overall difference and insert the new item based off of that into the list. Repeat until the top solution has all of the lists in. This means that you guarantee you have the best solution (of at least joint 1st), whilst spending far less time on the combinations that are obviously not the solution

Example (the number in brackets is the total difference)

[14, 22, 36, 48] [14, 23, 30, 72] [1, 18, 24]

Result list is [14(0), 22(0), 36(0), 48(0)]

Look at 14. Insert the new numbers [14 and 14(0), 22(0), 36(0), 48(0), 14 and 23(9), 14 and 30 (16), 14 and 72 (58)]
Look at 14 & 14. Insert the new numbers [22(0), 36(0), 48(0), 14 and 14 and 18(8), 14 and 23 (9),14 and 30 (16), 14 and 14 and 24 (20), 14 and 14 and 1(26), 14 and 72(58)]
Look at 22. Insert the new numbers [36(0), 48(0), 22 and 23(1), 14 and 14 and 18(8), 22 and 14(8), 22 and 30(8), 14 and 23 (9),14 and 30 (16), 14 and 14 and 24 (20), 14 and 14 and 1(26), 22 and 72(50), 14 and 72(58)]

Keep on repeating and you end up with 22, 23, 24 on top. Because this has all n lists in it, you can therefore stop and give back the answer

To optimise it:

Remove duplicates
Perhaps exploit ordered lists in some way
Think about where you put items with the same total difference, perhaps items with more numbers coming first

EDIT: Algorithmic complexity is O(n^2)