haskellunificationhindley-milner

How to derive the type of an applicator applied to the identity function


I want to derive the type of the following contrived applicator applied to the identity function. To achieve this I probably have to unify the type portion of the first argument (a -> [b]) with the type of id:

ap :: (a -> [b]) -> a -> [b]
id :: a -> a

a -> [b]
a0 -> a0 -- fresh type vars

a ~ a0 -- mappings
[b] ~ a0

a0 -> a0 -- substitution

This is obviously wrong, since the expected type is [b] -> [b]. There is an ambiguity within the unification, because a0 cannot be equivalent to both a and [b], except for a ~ [b] . But what is the rule that tells me to substitute a with [b] and not the other way around, as I would have to do with ap :: ([a] -> b) -> [a] -> b for example.

I know this is a very specific question, sorry. Hopefully it is not too confusing!


Solution

  • Ok, new answer because I now understand the question being asked! To restate the question:

    Given

    ap :: (a -> [b]) -> a -> [b]
    id :: a -> a
    

    Explain how the type of the expression ap id is derived.

    Answer:

    Rename variables:

    ap :: (a -> [b]) -> a -> [b]
    id :: a0 -> a0
    

    Unify:

    (a -> [b]) ~ (a0 -> a0)
    

    Apply generativity a couple times, pulling the arguments from the (->) type constructor:

    a   ~ a0
    [b] ~ a0
    

    Apply commutativity/transitivity of type equality:

    [b] ~ a
    

    Substitute the most specific types known into the types of ap and id

    ap :: ([b] -> [b]) -> [b] -> [b]
    id :: [b] -> [b]
    

    [b] is the most specific type known, because it provides some restriction. The value must be a list of something. The other two equivalent type expressions just mean any type at all. You can think of unification as a constraint-solving process. You find the maximal type that satisfies the constraints provided, which amount to "it's a list of something" for this case.

    Now that the types are unified, the type of the function application is the type of the function's result:

    ap id :: [b] -> [b]
    

    I can see why the choice of [b] looks a little odd in this case, because only one of the three type expressions contributed factors to unify. There are more involved cases where constraints come from multiple places, though.

    Let's consider a more advanced case. This might involve some things you haven't seen before. If it does, I apologize for jumping straight to the deep end.

    Given:

    f1 :: (a -> b) -> f a -> f b
    f2 :: p c d -> (e, c) -> (e, d)
    

    Unify the types of f1 and f2.

    Let's be really careful with this one. First up, rewrite all the types in terms of prefix application. Even the (->) types. This is going to be ugly:

    f1 :: (->) ((->) a b) ((->) (f a) (f b))
    f2 :: (->) (p c d) ((->) ((,) e c) ((,) e d))
    

    Unify and apply generativity twice to top-level (->) type constructors:

    ((->) a b) ~ (p c d)
    ((->) (f a) (f b)) ~ ((->) ((,) e c) ((,) e d))
    

    And, just keep unifying and applying generativity:

    (->) ~ p
    a ~ c
    b ~ d
    f a ~ (,) e c
    f b ~ (,) e d
    f ~ (,) e
    

    Ok, we've built up a giant stack of constraints now. Choosing between a and c or b and d doesn't matter, as they're equivalently constrained. Let's choose letters closer to the beginning of the alphabet when it doesn't matter. (->) is more constrained than p, so it wins there, and (,) e is more constrained than f. Call it a winner too.

    Then switch back to infix type constructors to make it pretty, and the unified type is:

    (a -> b) -> (e, a) -> (e, b)
    

    Notice how each of the two starting types contributed a constraint to the final unified type. f1 requires the p type in f2 to be more specific, and f2 required the f type in f1 to be more specific.

    Overall, this is a super-mechanical process. It's also fiddly and requires precise tracking of what you know. There's a reason we mostly leave it to the compiler to handle this. It is absolutely useful in the cases when something goes wrong and you want to double-check the process yourself to see why the compiler is reporting an error, though.