Can you strip a matrix in O(n^2) time?

I have an n by n binary matrix and three operations. I can zero out a row or column that has exactly one solitary one in it and I can zero out a row if it has an exact copy. How can I apply these operations repeatedly until none of them can be applied in O(n^2) time?

I feel it might be possible. For each row we can keep a count of the number of ones in it and do the same for each column. To find which row (or column ) to zero we can go down this list in O(n) time to find where the count is 1. We can then go along the row (or column) to zero out the solitary one and decrease the count for the row and column that it is on by one.

If there are then no ones in the row or column count we can sort the rows using radix sort and then zero out any duplicates rows. That should take O(n^2).

When we zero out a row we also have to update up to n column counts and some of those might become one so everything needs to be repeated.

The problem is if we need to do the O(n^2) work too many times. However when we zero out a one in a column that makes two rows identical, we know one of those copies is the current row. Maybe that helps?

To recap, zeroing a solitary one in a row can make there be only one one in a column. Zeroing out a solitary one in a column can both make it so that two rows are identical and alternatively make it so that there is only one solitary one in a row. Zeroing out a copy of a row can make a column have only one solitary one in it.

Solution

To build on another answer- you could use Zobrist Hashing, often used in (historically?) in chess engines to quickly update a hash based on individual piece changes. You will get some undesirable collisions, but its fast and "good enough", provided you implement logic to actually handle the rare collision.

To determine which rows to zero out, I track the hashes for each row as well as maintain a set of hashes which represent potential duplicate rows (otherwise they're just ordinary hash collisions). This allows me to quickly lookup the corresponding rows in a dictionary and verify that they are indeed identical, before zero'ing one of them.

For the other two rules, I also track rows that contain just a single '1'. In an optimized implementation, you could avoid this an just deal with each such row as it is created (removing 1-rows or 1-cols will only create at most a single new 1-row or 1-col, so you could just recursively handle these, and then just iterate over each in the duplicate row scenario). This would avoid needing to track single rows/cols explicitly.

In theory, you could potentially get incredibly, astronomically unlucky and have a bunch of hash collisions forcing the duplicate row detection to be O(N^2) instead of O(1), especially if you have a huge amount of rows / cols, but you can also just increase the hash size to offset this.

Here's some Python code demonstrating this approach. I have done very little testing, besides a few easy cases, so there could be a mistake somewhere, but it demonstrates the core ideas none-the-less.

from collections import defaultdict
from contextlib import contextmanager
from itertools import combinations
from random import randrange

class Row:
    def __init__(self, index):
        self.index = index
        self.cols = set()
        
        self.id = randrange(2 ** 32)
        self.hash = 0

    def add(self, col):
        self.cols.add(col)
        self.hash ^= col.id

    def remove(self, col):
        self.cols.remove(col)
        self.hash ^= col.id

    def pop(self):
        col = self.cols.pop()
        self.hash ^= col.id

    def __len__(self):
        return len(self.cols)
        
    def __hash__(self):
        return self.id

class Matrix:
    def __init__(self, nrows, ncols, transpose=None):
        self.rows = list(map(Row, range(nrows)))

        if transpose:
            self.T = transpose
        else:
            self.T = Matrix(ncols, nrows, transpose=self)

        self.single_rows = set()              # rows with single '1' column
        self.hash_to_rows = defaultdict(set)  # dictionary mapping Zobrist Hashes to rows
        self.repeat_hashes = set()            # hashes with collisions (potential duplicate rows)

    @contextmanager
    def update_row(self, row):
        self.hash_to_rows[row.hash].discard(row)
        if len(self.hash_to_rows[row.hash]) < 2:
            self.repeat_hashes.discard(row.hash)

        yield row
        
        self.hash_to_rows[row.hash].add(row)
        if len(self.hash_to_rows[row.hash]) > 1 and any(self.hash_to_rows[row.hash]):
            self.repeat_hashes.add(row.hash)
        
        if len(row) == 1:
            self.single_rows.add(row)
        else:
            self.single_rows.discard(row)
            
    def reduce_single_row(self):
        row = self.single_rows.pop()
        with self.update_row(row):
            col = row.cols.pop()
        with self.T.update_row(col):
            col.remove(row)
        return row

    def reduce_duplicate_row(self):
        hash = self.repeat_hashes.pop()
            
        rows = [row for row in self.hash_to_rows[hash] if row]
        for row1, row2 in combinations(rows, 2):
            if row1.cols == row2.cols:
                if len(rows) > 2:
                    self.repeat_hashes.add(hash)
                for col in list(row1.cols):
                    with self.T.update_row(col):
                        col.remove(row1)
                    with self.update_row(row1):
                        row1.remove(col)
                return row1

    def reduce_step(self):
        while self.repeat_hashes:
            row = self.reduce_duplicate_row()
            if row is not None:
                return "duplicate row", row.index
                
        if self.single_rows:
            row = self.reduce_single_row()
            return "single row", row.index
        elif self.T.single_rows:
            col = self.T.reduce_single_row()
            return "single col", col.index
        else:
            return None

    def reduce(self):
        while step := self.reduce_step():
            yield step

    @classmethod
    def from_data(cls, data):
        nrows = len(data)
        ncols = len(data[0])

        matrix = cls(nrows, ncols)
        for row, rdata in zip(matrix.rows, data):
            for col, elem in zip(matrix.T.rows, rdata):
                if elem:
                    with matrix.update_row(row):
                        row.add(col)
                    with matrix.T.update_row(col):
                        col.add(row)
                    
        return matrix

You can use Matrix.reduce() to get an output of steps taken to reduce your matrix to zeros (as much as possible anyways):

data = [
    [1, 0, 0],
    [1, 1, 0],
    [1, 1, 1],
]
matrix = Matrix.from_data(data)
for step in matrix.reduce():
    print(step)

Outputting:

('single row', 0)
('single col', 2)
('duplicate row', 1)
('single col', 0)
('single row', 2)