algorithm

Can you strip a matrix in O(n^2) time?


I have an n by n binary matrix and three operations. I can zero out a row or column that has exactly one solitary one in it and I can zero out a row if it has an exact copy. How can I apply these operations repeatedly until none of them can be applied in O(n^2) time?

I feel it might be possible. For each row we can keep a count of the number of ones in it and do the same for each column. To find which row (or column ) to zero we can go down this list in O(n) time to find where the count is 1. We can then go along the row (or column) to zero out the solitary one and decrease the count for the row and column that it is on by one.

If there are then no ones in the row or column count we can sort the rows using radix sort and then zero out any duplicates rows. That should take O(n^2).

When we zero out a row we also have to update up to n column counts and some of those might become one so everything needs to be repeated.

The problem is if we need to do the O(n^2) work too many times. However when we zero out a one in a column that makes two rows identical, we know one of those copies is the current row. Maybe that helps?

To recap, zeroing a solitary one in a row can make there be only one one in a column. Zeroing out a solitary one in a column can both make it so that two rows are identical and alternatively make it so that there is only one solitary one in a row. Zeroing out a copy of a row can make a column have only one solitary one in it.


Solution

  • To build on another answer- you could use Zobrist Hashing, often used in (historically?) in chess engines to quickly update a hash based on individual piece changes. You will get some undesirable collisions, but its fast and "good enough", provided you implement logic to actually handle the rare collision.

    To determine which rows to zero out, I track the hashes for each row as well as maintain a set of hashes which represent potential duplicate rows (otherwise they're just ordinary hash collisions). This allows me to quickly lookup the corresponding rows in a dictionary and verify that they are indeed identical, before zero'ing one of them.

    For the other two rules, I also track rows that contain just a single '1'. In an optimized implementation, you could avoid this an just deal with each such row as it is created (removing 1-rows or 1-cols will only create at most a single new 1-row or 1-col, so you could just recursively handle these, and then just iterate over each in the duplicate row scenario). This would avoid needing to track single rows/cols explicitly.

    In theory, you could potentially get incredibly, astronomically unlucky and have a bunch of hash collisions forcing the duplicate row detection to be O(N^2) instead of O(1), especially if you have a huge amount of rows / cols, but you can also just increase the hash size to offset this.

    Here's some Python code demonstrating this approach. I have done very little testing, besides a few easy cases, so there could be a mistake somewhere, but it demonstrates the core ideas none-the-less.

    from collections import defaultdict
    from contextlib import contextmanager
    from itertools import combinations
    from random import randrange
    
    class Row:
        def __init__(self, index):
            self.index = index
            self.cols = set()
            
            self.id = randrange(2 ** 32)
            self.hash = 0
    
        def add(self, col):
            self.cols.add(col)
            self.hash ^= col.id
    
        def remove(self, col):
            self.cols.remove(col)
            self.hash ^= col.id
    
        def pop(self):
            col = self.cols.pop()
            self.hash ^= col.id
    
        def __len__(self):
            return len(self.cols)
            
        def __hash__(self):
            return self.id
    
    class Matrix:
        def __init__(self, nrows, ncols, transpose=None):
            self.rows = list(map(Row, range(nrows)))
    
            if transpose:
                self.T = transpose
            else:
                self.T = Matrix(ncols, nrows, transpose=self)
    
            self.single_rows = set()              # rows with single '1' column
            self.hash_to_rows = defaultdict(set)  # dictionary mapping Zobrist Hashes to rows
            self.repeat_hashes = set()            # hashes with collisions (potential duplicate rows)
    
        @contextmanager
        def update_row(self, row):
            self.hash_to_rows[row.hash].discard(row)
            if len(self.hash_to_rows[row.hash]) < 2:
                self.repeat_hashes.discard(row.hash)
    
            yield row
            
            self.hash_to_rows[row.hash].add(row)
            if len(self.hash_to_rows[row.hash]) > 1 and any(self.hash_to_rows[row.hash]):
                self.repeat_hashes.add(row.hash)
            
            if len(row) == 1:
                self.single_rows.add(row)
            else:
                self.single_rows.discard(row)
                
        def reduce_single_row(self):
            row = self.single_rows.pop()
            with self.update_row(row):
                col = row.cols.pop()
            with self.T.update_row(col):
                col.remove(row)
            return row
    
        def reduce_duplicate_row(self):
            hash = self.repeat_hashes.pop()
                
            rows = [row for row in self.hash_to_rows[hash] if row]
            for row1, row2 in combinations(rows, 2):
                if row1.cols == row2.cols:
                    if len(rows) > 2:
                        self.repeat_hashes.add(hash)
                    for col in list(row1.cols):
                        with self.T.update_row(col):
                            col.remove(row1)
                        with self.update_row(row1):
                            row1.remove(col)
                    return row1
    
        def reduce_step(self):
            while self.repeat_hashes:
                row = self.reduce_duplicate_row()
                if row is not None:
                    return "duplicate row", row.index
                    
            if self.single_rows:
                row = self.reduce_single_row()
                return "single row", row.index
            elif self.T.single_rows:
                col = self.T.reduce_single_row()
                return "single col", col.index
            else:
                return None
    
        def reduce(self):
            while step := self.reduce_step():
                yield step
    
        @classmethod
        def from_data(cls, data):
            nrows = len(data)
            ncols = len(data[0])
    
            matrix = cls(nrows, ncols)
            for row, rdata in zip(matrix.rows, data):
                for col, elem in zip(matrix.T.rows, rdata):
                    if elem:
                        with matrix.update_row(row):
                            row.add(col)
                        with matrix.T.update_row(col):
                            col.add(row)
                        
            return matrix
    

    You can use Matrix.reduce() to get an output of steps taken to reduce your matrix to zeros (as much as possible anyways):

    data = [
        [1, 0, 0],
        [1, 1, 0],
        [1, 1, 1],
    ]
    matrix = Matrix.from_data(data)
    for step in matrix.reduce():
        print(step)
    

    Outputting:

    ('single row', 0)
    ('single col', 2)
    ('duplicate row', 1)
    ('single col', 0)
    ('single row', 2)