pythonnumpyscipyor-toolsheuristics

Heuristic to choose five column arrays that maximise the dot product


I have a sparse 60000x10000 matrix M where each element is either a 1 or 0. Each column in the matrix is a different combination of signals (ie. 1s and 0s). I want to choose five column vectors from M and take the Hadamard (ie. element-wise) product of them; I call the resulting vector the strategy vector. After this step, I compute the dot product of this strategy vector with a target vector (that does not change). The target vector is filled with 1s and -1s such that having a 1 in a specific row of the strategy vector is either rewarded or penalised.

Is there some heuristic or linear algebra method that I could use to help me pick the five vectors from the matrix M that result in a high dot product? I don't have any experience with Google's OR tools nor Scipy's optimization methods so I am not too sure if they can be applied to my problem. Advice on this would be much appreciated! :)

Note: the five column vectors given as the solution does not need to be the optimal one; I'd rather have something that does not take months/years to run.


Solution

  • First of all, thanks for a good question. I don't get to practice numpy that often. Also, I don't have much experience in posting to SE, so any feedback, code critique, and opinions relating to the answer are welcome.

    This was an attempt at finding an optimal solution at first, but I didn't manage to deal with the complexity. The algorithm should, however, give you a greedy solution that might prove to be adequate.

    Colab Notebook (Python code + Octave validation)

    Core Idea

    Note: During runtime, I've transposed the matrix. So, the column vectors in the question correspond to row vectors in the algorithm.

    Notice that you can multiply the target with one vector at a time, effectively getting a new target, but with some 0s in it. These will never change, so you can filter out some computations by removing those rows (columns, in the algorithm) in further computations entirely - both from the target and the matrix. - you're then left with a valid target again (only 1s and -1 in it).

    That's the basic idea of the algorithm. Given:

    Do an exponentially-complex O((n*m)^b) depth-first search, but decrease the complexity of the calculations in deeper layers by reducing target/matrix size, while cutting down a few search paths with some heuristics.

    Heuristics used

    1. The best score achieved so far is known in every recursion step. Compute an optimistic vector (turn -1 to 0) and check what scores can still be achieved. Do not search in levels where the score cannot be surpassed.

      This is useless if the best vectors in the matrix have 1s and 0s equally distributed. The optimistic scores are just too high. However, it gets better with more sparsity.

    2. Ignore duplicates. Basically, do not check duplicate vectors in the same layer. Because we reduce the matrix size, the chance for ending up with duplicates increases in deeper recursion levels.

    Further Thoughts on Heuristics The most valuable ones are those that eliminate the vector choices at the start. There's probably a way to find vectors that are worse-or-equal than others, with respect to their affects on the target. Say, if v1 only differs from v2 by an extra 1, and target has a -1 in that row, then v1 is worse-or-equal than v2.

    The problem is that we need to find more than 1 vector, and can't readily discard the rest. If we have 10 vectors, each worse-or-equal than the one before, we still have to keep 5 at the start (in case they're still the best option), then 4 in the next recursion level, 3 in the following, etc.

    Maybe it's possible to produce a tree and pass it on in into recursion? Still, that doesn't help trim down the search space at the start... Maybe it would help to only consider 1 or 2 of the vectors in the worse-or-equal chain? That would explore more diverse solutions, but doesn't guarantee that it's more optimal.

    Warning: Note that the MATRIX and TARGET in the example are in int8. If you use these for the dot product, it will overflow. Though I think all operations in the algorithm are creating new variables, so are not affected.

    Code

    # Given:
    TARGET = np.random.choice([1, -1], size=60000).astype(np.int8)
    MATRIX = np.random.randint(0, 2, size=(10000,60000), dtype=np.int8)
    
    # Tunable - increase to search more vectors, at the cost of time.
    # Performs better if the best vectors in the matrix are sparse
    MAX_BRANCHES = 3   # can give more for sparser matrices
    
    # Usage
    score, picked_vectors_idx = pick_vectors(TARGET, MATRIX, 5)
    
    # Function
    def pick_vectors(init_target, init_matrix, vectors_left_to_pick: int, best_prev_result=float("-inf")):
        assert vectors_left_to_pick >= 1
        if init_target.shape == (0, ) or len(init_matrix.shape) <= 1 or init_matrix.shape[0] == 0 or init_matrix.shape[1] == 0:
            return float("inf"), None
        target = init_target.copy()
        matrix = init_matrix.copy()
    
        neg_matrix = np.multiply(target, matrix)
        neg_matrix_sum = neg_matrix.sum(axis=1)
    
        if vectors_left_to_pick == 1:
            picked_id = np.argmax(neg_matrix_sum)
            score = neg_matrix[picked_id].sum()
            return score, [picked_id]
    
        else:
            sort_order = np.argsort(neg_matrix_sum)[::-1]
            sorted_sums = neg_matrix_sum[sort_order]
            sorted_neg_matrix = neg_matrix[sort_order]
            sorted_matrix = matrix[sort_order]
    
            best_score = best_prev_result
            best_picked_vector_idx = None
    
            # Heuristic 1 (H1) - optimistic target.
            # Set a maximum score that can still be achieved
            optimistic_target = target.copy()
            optimistic_target[target == -1] = 0
            if optimistic_target.sum() <= best_score:
                # This check can be removed - the scores are too high at this point
                return float("-inf"), None
    
            # Heuristic 2 (H2) - ignore duplicates
            vecs_tried = set()
    
            # MAIN GOAL:   for picked_id, picked_vector in enumerate(sorted_matrix):
            for picked_id, picked_vector in enumerate(sorted_matrix[:MAX_BRANCHES]):
                # H2
                picked_tuple = tuple(picked_vector)
                if picked_tuple in vecs_tried:
                    continue
                else:
                    vecs_tried.add(picked_tuple)
    
                # Discard picked vector
                new_matrix = np.delete(sorted_matrix, picked_id, axis=0)
                
                # Discard matrix and target rows where vector is 0
                ones = np.argwhere(picked_vector == 1).squeeze()
                new_matrix = new_matrix[:, ones]
                new_target = target[ones]
                if len(new_matrix.shape) <= 1 or new_matrix.shape[0] == 0:
                    return float("-inf"), None
    
                # H1: Do not compute if best score cannot be improved
                new_optimistic_target = optimistic_target[ones]
                optimistic_matrix = np.multiply(new_matrix, new_optimistic_target)
                optimistic_sums = optimistic_matrix.sum(axis=1)
                optimistic_viable_vector_idx = optimistic_sums > best_score
                if optimistic_sums.max() <= best_score:
                    continue
                new_matrix = new_matrix[optimistic_viable_vector_idx]
             
                score, next_picked_vector_idx = pick_vectors(new_target, new_matrix, vectors_left_to_pick - 1, best_prev_result=best_score)
                
                if score <= best_score:
                    continue
    
                # Convert idx of trimmed-down matrix into sorted matrix IDs
                for i, returned_id in enumerate(next_picked_vector_idx):
                    # H1: Loop until you hit the required number of 'True'
                    values_passed = 0
                    j = 0
                    while True:
                        value_picked: bool = optimistic_viable_vector_idx[j]
                        if value_picked:
                            values_passed += 1
                            if values_passed-1 == returned_id:
                                next_picked_vector_idx[i] = j
                                break
                        j += 1
    
                    # picked_vector index
                    if returned_id >= picked_id:
                        next_picked_vector_idx[i] += 1
    
                best_score = score
    
                # Convert from sorted matrix to input matrix IDs before returning
                matrix_id = sort_order[picked_id]
                next_picked_vector_idx = [sort_order[x] for x in next_picked_vector_idx]
                best_picked_vector_idx = [matrix_id] + next_picked_vector_idx
    
            return best_score, best_picked_vector_idx