Why Haskell treats a stuck application of a type family as a valid type?

Take this code snippet (I changed the type family to a closed one after the various comments and answers mentioned it):

-- a closed type family for which I won't define any instances
type family EmptyTypeFamily t where

-- 1. this type doesn't exist, since there's no `instance EmptyTypeFamily ()`
type NotAType = EmptyTypeFamily ()

-- 2. this value has a type that doesn't exist
untypedValue :: NotAType
untypedValue = undefined

main :: IO ()
main = do
  -- 3. the value with no type can be used in expressions
  return untypedValue
  return . id . fix $ \x -> untypedValue
  
  -- ERROR, finally! No instance for `Show (EmptyTypeFamily ())`
  print untypedValue

I find it quite counter intuitive that I can name a stuck application of a type family (1) and even define values for it (2) and use them in expressions (3). Of course I can't define any typeclass instances for it, since the type doesn't even exist. But shouldn't I get an error just by naming that non-type, let alone using it? This makes it harder to detect issues in my code. I read the GHC doc, which mentions "stuck" twice, but doesn't answer my question.

What's the reason why Haskell treats EmptyTypeFamily () as a valid type for which you can even define values and whatnot, when it isn't?

Solution

There are two factors that mean GHC needs to be able to contemplate stuck type families without immediately throwing an error and giving up.

The first is, as chi's answer pointed out, type families defined via separate type instance declarations are open. It's theoretically possible for your module to define EmptyTypeFamily, NotAType, and untypedValue and export them all, and for some other module to import your module and add a type instance EmptyTypeFamily () declaration that makes sense of this.

But there are also closed type families, where the set of instances are known statically to any module that has imported the type family at all. It's tempting to say these should at least immediately throw an error if they can't be resolved, and indeed that could conceivably be done in a case like yours where all of the arguments to the type family are fully concrete types. But the second reason GHC needs the concept of stuck type families that aren't an error is that type families are usually used in more complex cases with type variables in play. Consider this:

{-# LANGUAGE TypeFamilies #-}

type family TF t
  where TF Bool = Int
        TF Char = Bool

foo :: t -> TF t -> TF t
foo = flip const

The TF t in the type of foo is a stuck type family application; GHC can't resolve it here. Whether the declared type for foo is an error or not depends on the variable t, which depends on the context in which foo is used, not just its definition.

So GHC needs this ability to work with things with a type given by a type expression that it cannot actually resolve now. Given that, I'm not overly bothered by it not throwing errors eagerly in cases like the OP (assuming EmptyTypeFamily was rewritten to be a closed family; it definitely shouldn't with the open family). I don't know precisely, but I've always assumed it was either:

It's not possible to make a general procedure for GHC to be able to tell the difference between a stuck application that could potentially be unstuck with more information, and one that definitely can't.
It is possible, but the inconsistency and unpredictability is not considered desirable. Remember that some other GADTs or type families might be able to resolve for any type at all (much like a non-strict function not necessarily needing its argument to be defined), and be applied to a stuck family application; if GHC is distinguishing between "definitely stuck" applications that immediately throw an error and "potentially unstuck-able" applications, the programmer would need to be able to always make the same determination in their head to tell when complex cases are allowable. I'm genuinely unsure whether this could be made pleasant to work with.
It's possible implement and can be made nicely predictable, but it's a special case in the general handling of type families that no one has added yet. The original feature only had open families, I'm fairly sure, so the original guts of type family handling would be written without any real possibility of "definitely stuck" applications existing to be detected.

If you're using closed type families, one thing you can do is add a catch-all case that resolves to a TypeError. Like so:

{-# LANGUAGE DataKinds, TypeFamilies, UndecidableInstances #-}

-- These are in GHC.TypeError as of base 4.17, but the current
-- default on my system is base 4.16. GHC.TypeLits still exports
-- them for backwards compatibility so this will work, but might
-- eventually be deprecated?
import GHC.TypeLits ( TypeError
                    , ErrorMessage ( Text, ShowType, (:<>:) )
                    )
type family TF t
  where TF t = TypeError (Text "No TF instance for " :<>: ShowType t)

bad :: TF ()
bad = undefined

This helps, but it doesn't catch every possible use of a stuck TF application. It won't catch definitions like bad, because it's still "lazy" about actually evaluating TF (); you told it the type was TF (), and the implementation is undefined which can be any type so it happily unifies with TF () and passes the type check without the compiler ever needing to actually evaluate TF (). But it does catch more cases than a stuck type family application would because it's not stuck: it resolves to a type error. If you have any other binding where it has to infer a type and that depends on the type of bad, it seems to hit the error; even something like boom = bad where you'd think it would similarly be able to unify without any evaluation. And even asking for the type of bad in ghci with :t generates a type error.

At the very least it gives you better error messages than No instance for Show (EmptyTypeFamily ()) because it will resolve the type family to the type error, rather than go looking for an instance that could match a stuck type family and complain about the missing instance.