listhaskelllist-comprehensionmonads

In Haskell what are the inner workings of list comprehension?


I'm rather new to Haskell and was looking at this post here: Cartesian product of 2 lists in Haskell.

In the answer there is this snippet of code:

cartProd xs ys = [(x,y) | x <- xs, y <- ys]

Which with these two lists:

xs = [1,2,3]
ys = [4,5,6]

would yield

[(1,4),(1,5),(1,6),(2,4),(2,5),(2,6),(3,4),(3,5),(3,6)]

Had I not seen this result I would have assumed it would just have returned

[(1,4),(2,5),(3,6)]

because it would traverse both lists at the same time.

But now it - to programming languages I know better - looks like a double for loop used to traverse a matrix:

for (int x = 1; x < 4; x++)
    for(int y = 4; y < 7; y++)
        //make tuple (x,y)

What causes a list comprehension to behave in this manner?


Solution

  • This introduction explains the syntax of list comprehension. Basically one can say that every x <- list means an additional nested "for"-loop to generate tuples and every predicate is simply tested. Thus the expression:

    [(x,y) | x <- xs, even x, y <- ys, even 3*y-div x 2]
    

    Would be translated in an imperative language as:

    for (var x : xs) {
        if(even(x)) {
        for(var y : ys) {
            if(even(3*y-x/2)) {
                yield (x,y)
            }
        }
    }
    

    yield is a keyword that is sometimes used with co-routines. Furthermore as for yield the evaluation is done lazily. This enables for instance to generate all even integers as:

    [x|x <-[2..],even x]
    

    List monads

    In order to understand list comprehension fundamentally, one needs to know what Monads are. Every list comprehension can be translated into a list monad. For instance your example is translated into:

    do x <- xs
       (do y <- ys
           return (x,y))
    

    Which is again syntactical sugar for:

    xs >>= (\x -> (ys >>= \y -> return (x,y)))
    

    A monad is an important concept in functional programming (and probably one better reads the wikipedia page), because it is a bit hard to master. The sometimes say a monads are like burritos,....

    Once you more or less understand a monad: a monad is a type-class with a return statement and a >>= channeling statement. Now the return statement for the inner part is easy:

    return x = [x]
    

    So that means that each time x and y are set, you will create a tuple (x,y) and return it as a singleton list: thus [(x,y)]. Now the "bind" operator >>= needs to "glue" ys and \y -> return (x,y) together. This is done by implementing it as:

    (>>=) xs f = concat $ map f xs
    

    In other words you do a mapping and concatenate the result of the mapping.

    Now if take the second part of the unsugared expression into account:

    ys >>= \y -> return (x,y)
    

    It means that for a given x (we now abstract away), we will map every element in ys to a tuple (x,y) and return it. We will thus generate a list of lists with every list being a singleton containing a tuple. Something like (if ys=[1,2]):

    [[(x,1)],[(x,2)]]
    

    Now the >>= will furthermore concat it into:

    \x -> [(x,1),(x,2)]
    

    Until now, we've abstracted the x away (assumed it was one). But now we can take the first part of that expression:

    xs >>= \x -> [(x,1),(x,2)]
    

    If xs=[3,5], it means we will create again lists:

    [[(3,1),(3,2)],[(5,1),(5,2)]]
    

    and after the concat:

    [(3,1),(3,2),(5,1),(5,2)]
    

    Which is what we expect for:

    [(x,y)|x<-[3,5],y<-[1,2]]