haskell type-inference subtyping forall rank-n-types

Why does (forall a. a -> forall b. b -> forall c. b -> c) subsume Int -> Int -> forall c. Int -> c, but forall a b c. a -> b -> b -> c does not?

I'm studying type inference for higher-ranked types, specifically based on the system described in the paper Practical Type Inference for Arbitrary-Rank Types by Peyton Jones et al.

In Section 4.9.3, the authors discuss why both types involved in subsumption (ρ₁ ≤ ρ₂) must be in weak-prenex form. The paper states:

The first clause is self explanatory, but the second might seem a little surprising: why must ρ₁ and ρ₂ be in weak-prenex form? Here is a counter-example when they are not. Suppose
σ₁ = ∀a. a → ∀b. b → ∀c. b → c
σ₂ = Int → ∀c. Int → c
σ₃ = ∀abc. a → b → b → c
Then it is derivable that ⊢⇓ (λx. x 3) : (σ₁ → σ₂), but it is not derivable that ⊢⇓ (λx. x 3) : (σ₃ → σ₂), although ⊢dsk σ₁ → σ₂ ≤ σ₃ → σ₂.

I wanted to understand this by testing it in GHC. I tried the following two examples:

-- Accepted by GHC
test1 :: (forall a. a -> forall b. b -> forall c. b -> c) -> (Int -> forall c. Int -> c)
test1 = \x -> x 3

-- Rejected by GHC
test2 :: (forall a. forall b. forall c. a -> b -> b -> c) -> (Int -> forall c. Int  -> c)
test2 = \x -> x 3

So test1 typechecks, but test2 does not.

Intuitively, it seems like:

forall a. a -> forall b. b -> forall c. b -> c  ≤  Int -> Int -> forall c. Int -> c

but

forall a b c. a -> b -> b -> c  ≤  Int -> Int -> forall c. Int -> c

does not hold. I think in both cases forall c on the right hand side would be pulled out before instantiating the foralls on the left.

Could someone explain in more detail:

Why GHC (and the system in the paper) accepts the first case but not the second, and where does this difference comes in the typing rules described in Peyton jones paper?

Solution

There is a lot going on in this question. Let's take a step back. The excerpt is talking about the following claim (Theorem 4.14, item 2):

If Γ ⊢⇓ t : ρ1 and ⊢dsk ρ1 ≤ ρ2 and ρ1 and ρ2 are in weak-prenex form, then Γ ⊢⇓ t : ρ2.

The question is why the condition "ρ1 and ρ2 are in weak-prenex form" is necessary. One might expect to make this simpler claim:

If Γ ⊢⇓ t : ρ1 and ⊢dsk ρ1 ≤ ρ2, then Γ ⊢⇓ t : ρ2.

And the paper argues that claim is wrong. You were focusing on ⊢dsk ρ1 ≤ ρ2, but that is the wrong thing to focus on. The goal is to find an example that satisfies the negation of the implication above. The negation of "If A then B" is "A and NOT B". We are looking for one example that satisfies:

Γ ⊢⇓ t : ρ1 and ⊢dsk ρ1 ≤ ρ2 and NOT Γ ⊢⇓ t : ρ2

The interesting part is "NOT Γ ⊢⇓ t : ρ2".

The counterexample from the paper turns out not to quite work, and it is overly complex anyway; more at the end of this answer. A simpler counterexample is ρ1 = Int → ∀a. a → a, ρ2 = ∀a. Int → a → a, t = λx. λy. y.

Proof. You can check Γ ⊢⇓ t : ρ1 and ⊢dsk ρ1 ≤ ρ2. Then we show that there is no possible derivation of Γ ⊢⇓ t : ρ2: the type system is syntax directed and t is an abstraction so only the ABS2 rule could apply, but that requires ρ2 to be an arrow → instead of a quantifier ∀.

Nobody is saying that the following subsumption

forall a b c. a -> b -> b -> c  ≤  Int -> Int -> forall c. Int -> c

is false. It is actually true for exactly the reason you said: deep subsumption "pulls out" the forall c on the right.

⊢⇓ is only one part of their type system, which is precisely the one that deals with types in weak-prenex form. More precisely, they deal with the part that remains after removing the quantifiers of the weak-prenex form. That's why it's OK that the theorems about it are restricted to weak-prenex types.

The typing relation that better represents their "whole" type system is ⊢⇓poly, for which there is an unrestricted subsumption lemma (item 1 of the same Theorem 4.14):

If Γ ⊢⇓poly t : ρ1 and ⊢dsk ρ1 ≤ ρ2, then Γ ⊢⇓poly t : ρ2.

The counterexample from the paper is incorrect because the GEN2 rule can be applied after the ABS2 rule so weak-prenex quantifiers can still be removed.

However we can modify the counterexample into a valid one for clause 4 of Theorem 4.14 which considers subsumption of contexts (instead of clause 2 which is about subsumption of the result type) by moving the variable into the context ourselves, as if we had a variant of the ABS2 rule with ⊢⇓ instead of ⊢⇓poly.

Claim: There exists no derivation of x : σ₃ ⊢⇓ x 3 : σ₂.

Because the rules are syntax-directed, such a derivation must come from APP, then VAR on x and INT on 3. The VAR on x produces x : Int → Int → Int → c1 for a fresh c1, then in the APP rule we have σ' = Int → Int → c1 so the third clause of APP becomes Int → Int → c1 ≤ Int -> forall c. Int -> c which does not hold. Crucially, c1 is not quantified; in comparison, the following subsumption holds: forall c1. Int → Int → c1 ≤ Int -> forall c. Int -> c, a fact which might have been used if we modified the APP rule slightly to generalize σ' before the subsumption.

You tried that same example in GHC, and GHC rejected it. Note that GHC does not allow deep skolemization by default (forall c1. Int → Int → c1 ≤ Int -> forall c. Int -> c uses the DEEP-SKOL rule), so that's one more obstacle to the one of generalizing applications. Interestingly there was a whole kerfuffle about it in recent years (~2020): GHC started to implement deep skolemization in 9.0, it broke a lot of code, and quickly it was rolled back into an opt-in.