pythonalgorithmoptimizationlarge-datacomplex-numbers

How to select element from two complex number lists to get minimum magnitude for their sum


I have two python lists(list1 and list2),each containing 51 complex numbers. At each index i, I can choose either list1[i] or list2[i]. I want to select one element per index(Total of 51 elements) such that magnitude of sum of the selected complex number list is minimized.

Sample code I have tried below:

import random
import itertools
import math

 
list1 = [complex(random.uniform(-10, 10), random.uniform(-10, 10)) for _ in range(51)]
list2 = [complex(random.uniform(-10, 10), random.uniform(-10, 10)) for _ in range(51)]

min_magnitude = math.inf
for choice in itertools.product([0, 1], repeat=51):
    current = [list1[i] if bit == 0 else list2[i] for i, bit in enumerate(choice)]
    total = sum(current)
    mag = abs(total)
    if mag < min_magnitude:
        min_magnitude = mag
        best_combination = current
        
print(min_magnitude)
print(best_combination)

it takes long time to run is there any alternative or technique to get the exact answer?


Solution

  • Let's slightly reinterpret your problem.

    Instead of strictly choosing one element from either of the two lists, start with an initial state where you've selected all elements from list1. The sum of this selection is your offset value, which we'll call C. Next, create a new list called list3, defined as list3 = list2 - list1. With this formulation, your problem becomes choosing any number of elements from list3 whose sum closely approximates -C. This is precisely the Subset Sum Problem.

    Although this is computationally intensive, there are several significantly more efficient methods than brute-force. A well-known approach mentioned in the above article (also suggested by @Robert) is the "meet-in-the-middle" technique.

    This technique splits the original set into two subsets and calculates all possible combinations within each subset. Although this part is brute-force, it becomes practical because each subset is half the original size. Once all combinations have been calculated, the next step is to select one item from each subset that most cancel each other out. Several methods exist for this, but I prefer using scipy's KDTree for simplicity.

    Here's a naive implementation to help understand the logic. (Note: this won't work for n=51)

    def meet_in_the_middle_naive(list1, list2):
        # Split the lists into two halves.
        n = len(list1)
        half_n = n // 2
        list1_head, list1_tail = list1[:half_n], list1[half_n:]
        list2_head, list2_tail = list2[:half_n], list2[half_n:]
    
        # Calculate all combinations and their sums of each half.
        # This is technically a brute-force approach, but the length of the lists are small enough to be feasible.
        head_combinations = [(sum(c), c) for c in itertools.product(*zip(list1_head, list2_head))]
        tail_combinations = [(sum(c), c) for c in itertools.product(*zip(list1_tail, list2_tail))]
    
        # Build a KDTree and search for the nearest points between the two halves.
        def as_real_xy(c):
            # Convert a complex number to a real value pair (x, y).
            # This is required for the KDTree to work correctly.
            return c.real, c.imag
    
        head_points = np.array([as_real_xy(p) for p, _ in head_combinations])
        tail_points = np.array([as_real_xy(p) for p, _ in tail_combinations])
        head_tree = KDTree(head_points)
        distances, head_indexes = head_tree.query(-tail_points)
        #                                         ^ Searching for the zero-sum combinations.
    
        # The above searches for a list of nearest neighbors for each point in head.
        # So, we need to find the nearest one in that list.
        tail_index = distances.argmin()
        head_index = head_indexes[tail_index]
        best_combination = head_combinations[head_index][1] + tail_combinations[tail_index][1]
        return abs(sum(best_combination)), list(best_combination)
    

    Although computationally improved, this naive implementation is still inefficient and memory-intensive. I'll skip the details since it's a programming detail, but I've further optimized this using more_itertools and numba.

    Here's the optimized version along with test and benchmark:

    import itertools
    import math
    import random
    import time
    
    import more_itertools
    import numpy as np
    from numba import jit
    from scipy.spatial import KDTree
    
    
    def generate_list(n):
        return [complex(random.uniform(-10, 10), random.uniform(-10, 10)) for _ in range(n)]
    
    
    def baseline(list1, list2):
        assert len(list1) == len(list2)
        n = len(list1)
    
        min_magnitude = math.inf
        best_combination = None
        for choice in itertools.product([0, 1], repeat=n):
            current = [list1[i] if bit == 0 else list2[i] for i, bit in enumerate(choice)]
            total = sum(current)
            mag = abs(total)
            if mag < min_magnitude:
                min_magnitude = mag
                best_combination = current
        return min_magnitude, best_combination
    
    
    def meet_in_the_middle_naive(list1, list2):
        # Split the lists into two halves.
        n = len(list1)
        half_n = n // 2
        list1_head, list1_tail = list1[:half_n], list1[half_n:]
        list2_head, list2_tail = list2[:half_n], list2[half_n:]
    
        # Calculate all combinations and their sums of each half.
        # This is technically a brute-force approach, but the length of the lists are small enough to be feasible.
        head_combinations = [(sum(c), c) for c in itertools.product(*zip(list1_head, list2_head))]
        tail_combinations = [(sum(c), c) for c in itertools.product(*zip(list1_tail, list2_tail))]
    
        # Build a KDTree and search for the nearest points between the two halves.
        def as_real_xy(c):
            # Convert a complex number to a real value pair (x, y).
            # This is required for the KDTree to work correctly.
            return c.real, c.imag
    
        head_points = np.array([as_real_xy(p) for p, _ in head_combinations])
        tail_points = np.array([as_real_xy(p) for p, _ in tail_combinations])
        head_tree = KDTree(head_points)
        distances, head_indexes = head_tree.query(-tail_points)
        #                                         ^ Searching for the zero-sum combinations.
    
        # The above searches for a list of nearest neighbors for each point in head.
        # So, we need to find the nearest one in that list.
        tail_index = distances.argmin()
        head_index = head_indexes[tail_index]
        best_combination = head_combinations[head_index][1] + tail_combinations[tail_index][1]
        return abs(sum(best_combination)), list(best_combination)
    
    
    @jit(cache=True)
    def product_sum(arr_x, arr_y):
        """Calculate the sum of all combinations of two lists.
    
        Equivalent to: `[sum(current) for current in itertools.product(*zip(arr_x, arr_y))]`
        """
        n = len(arr_x)
        total_combinations = 1 << n  # 2^n combinations
        out = np.empty(total_combinations, dtype=arr_x.dtype)
        for i in range(total_combinations):
            current_sum = 0.0
            for j in range(n):
                if (i >> (n - j - 1)) & 1:
                    current_sum += arr_y[j]
                else:
                    current_sum += arr_x[j]
            out[i] = current_sum
        return out
    
    
    def meet_in_the_middle_optimized(list1, list2):
        n = len(list1)
        half_n = n // 2
        list1_head, list1_tail = list1[:half_n], list1[half_n:]
        list2_head, list2_tail = list2[:half_n], list2[half_n:]
    
        # Keeping the combinations themselves consumes a lot of memory, so we only keep the sums.
        head_points = product_sum(np.array(list1_head), np.array(list2_head))
        tail_points = product_sum(np.array(list1_tail), np.array(list2_tail))
    
        def as_real_xy(arr):
            # Equivalent to `np.array([(p.real, p.imag) for p in arr])` but much faster.
            return arr.view(f"f{arr.real.itemsize}").reshape(-1, 2)
    
        head_tree = KDTree(as_real_xy(head_points), balanced_tree=False)  # Set False if the inputs are mostly random.
        distances, head_indexes = head_tree.query(-as_real_xy(tail_points), workers=-1)  # -1 to use all CPUs.
    
        tail_index = distances.argmin()
        head_index = head_indexes[tail_index]
        # nth_product is equivalent to `itertools.product(...)[index]`.
        # With this, we can directly obtain the combination without iterating through all combinations.
        head_combination = more_itertools.nth_product(head_index, *zip(list1_head, list2_head))
        tail_combination = more_itertools.nth_product(tail_index, *zip(list1_tail, list2_tail))
        best_combination = head_combination + tail_combination
        return abs(sum(best_combination)), list(best_combination)
    
    
    def test():
        for n in range(2, 15):
            for seed in range(10):
                random.seed(seed)
                list1 = generate_list(n)
                list2 = generate_list(n)
    
                expected = baseline(list1, list2)
    
                actual = meet_in_the_middle_naive(list1, list2)
                assert expected == actual, f"Naive results do not match! {n=}, {seed=}"
    
                actual = meet_in_the_middle_optimized(list1, list2)
                assert expected == actual, f"Optimized results do not match! {n=}, {seed=}"
    
        print("All tests passed!")
    
    
    def benchmark():
        n = 51
        random.seed(0)
        list1 = generate_list(n)
        list2 = generate_list(n)
    
        started = time.perf_counter()
        _ = meet_in_the_middle_optimized(list1, list2)
        elapsed = time.perf_counter() - started
        print(f"n={n}, elapsed={elapsed:.0f} sec")
    
    
    if __name__ == "__main__":
        test()
        benchmark()
    

    This solution solved n=51 in less than 30 seconds on my PC.