haskellmatrixfunctional-programmingnested-listsdepth-first-search

Efficient connected components from a 2D array


In an m x n matrix consisting of 0s and 1s, we are tasked with counting the 4-directionally connected islands of 1s. A python implementation is as follows:

    def numIslands(grid):
        def sink(i, j):
            if 0 <= i < len(grid) and 0 <= j < len(grid[i]) and grid[i][j] == '1':
                grid[i][j] = '0'
                list(map(sink, (i+1, i-1, i, i), (j, j, j+1, j-1)))
                return 1
            return 0
        return sum(sink(i, j) for i in range(len(grid)) for j in range(len(grid[i])))

I have tried and failed to solve this problem in Haskell under the following constraints:

Things I've tried:

The issue in all cases is the plumbing code to convert the matrix into any nice graph representation where we can appeal to a library function is always too long.

Here you can see a working implementation using Algebra.Graph.AdjacencyMap that does a lot of redundant work:

data Coord = C Int Int deriving (Eq, Ord, Show)
data Cell a = Cell Coord a deriving (Eq, Ord)
type A = Cell Char

coord :: Cell a -> Coord
coord (Cell c _) = c

mkcell :: Int -> Int -> a -> Cell a
mkcell i j = Cell (C i j)

cells :: [[a]] -> [[Cell a]]
cells = zipWith (map . uncurry . mkcell) [0..] . map (zip [0..])

is1 :: A -> Bool
is1 (Cell _ '1') = True
is1 (Cell _  _ ) = False

cardinals :: Coord -> [Coord]
cardinals c = map ($ c) [up, down, left, right]
  where
    up    (C i j) = C (i+1) j
    down  (C i j) = C (i-1) j
    left  (C i j) = C i (j-1)
    right (C i j) = C i (j+1)

coordMap :: [[a]] -> Map Coord [a]
coordMap board = M.fromAscList [(C i j, [z]) | (i,zs) <- zip [0..] board, (j,z) <- zip [0..] zs]

mkstars :: [[A]] -> [(A, [A])]
mkstars xss = map (id &&& nbs) $ ones =<< xss
  where
    ones  = filter is1
    value = flip (M.findWithDefault []) (coordMap xss)
    nbs   = ones . value <=< cardinals . coord

islands :: [[Char]] -> Int
islands = length . dfsForest . stars . mkstars . cells

And some simple test cases:

main :: IO ()
main = do
  hspecWith defaultConfig {configFailFast = True} $ do
    describe "islands" $ do

      it "handles test case 0" $ do
        islands [ ['1', '1', '1', '1', '0']
                , ['1', '1', '0', '1', '0']
                , ['1', '1', '0', '0', '0']
                , ['0', '0', '0', '0', '0'] ] `shouldBe` 1

      it "handles test case 1" $ do
        islands [ ['1', '1', '0', '0', '0']
                , ['1', '1', '0', '0', '0']
                , ['0', '0', '1', '0', '0']
                , ['0', '0', '0', '1', '1'] ] `shouldBe` 3

      it "handles test case 2" $ do
        islands [['1']] `shouldBe` 1

Notice that we perform a pair of zip [0..]-like operations in two places. I think this ought to be unnecessary. This clocks in at 220 tokens according to vim, and that doesn't even include the dfsForest logic or the stuff from Data.Map.Strict! It feels ridiculous that programming in Haskell should be so verbose and complex, so I'm hoping it's just a skill issue.

Can anyone point me in the right direction? I'd also be happy with an answer of the form, "this is not possible but [here] is the best we can do."

If it helps, ultimately, my goal is to come up with a standard method for very concisely marshalling 2D array problems into a format compatible with the graph algorithms implemented in either containers, algebraic-graphs, or fgl. Although right now the stuff in algebraic-graphs is looking the nicest to me.


Solution

  • Here's a fairly concise implementation:

    {-# Language BlockArguments #-}
    
    import Data.Partition
    import Data.Vector (Vector)
    import qualified Data.Set as S
    import qualified Data.Vector as V
    
    islands :: Vector (Vector Bool) -> Int
    islands = length . nontrivialSets . fromSets . V.toList . V.concat . V.toList . \grid ->
        flip V.imap grid \i row ->
        flip V.imap row \j val -> S.fromList
        [ (i',j',lmao)
        | val
        , (i',j') <- [(i,j), (i-1,j), (i+1,j), (i,j-1), (i,j+1)]
        , or (grid V.!? i' >>= (V.!? j'))
        , lmao <- "01"
        ]
    
    

    This uses data-partition and incurs a logarithmic overhead, as is common when translating mutable algorithms to immutable analogs naively, so runtime is O(mn log(mn)).* The lmao variable doubles the size of each island; nontrivialSets only reports sets with size strictly greater than one, and we want to report on islands with just one node in them.

    See it run in ghci:

    > islands . fmap (fmap ('1'==)) . read $ "[\"11000\", \"11000\", \"00100\", \"00011\"]"
    3
    

    A similar technique can be done with containers if you prefer.

    {-# Language BlockArguments #-}
    import Data.Graph
    import Data.Vector (Vector)
    import qualified Data.Vector as V
    
    -- I hate myself for this name. I also love myself for this name
    ifordl' :: Vector a -> b -> (b -> Int -> a -> b) -> b
    ifordl' v z f = V.ifoldl' f z v
    
    islands :: Vector (Vector Bool) -> Int
    islands = length . scc . (\(g, _, _) -> g) . graphFromEdges . \grid ->
        ifordl' grid [] \nodes i row ->
        ifordl' row nodes \nodes' j val ->
        [((), (i, j), filter (\(i',j') -> or (grid V.!? i >>= (V.!? j)))
                             [(i+1,j), (i-1,j), (i,j+1), (i,j-1)])
        |val] ++ nodes'
    
    

    As with the previous approach, this is O(mn log(mn)).


    * It is well-known that any mutable algorithm can be made pure with an O(log n) multiplicative penalty. I believe it is an open question whether there are any problems which provably require that penalty, so it's likely that a cleverer theoretician than I could improve from here!