haskellalgebraic-data-typesphantom-types

Convert Sum Type to Phantom Type via Type Classes


I was experimenting with phantom types in Haskell. My goal is to convert the LangCode Type to it's corresponding Phantom Type representation via Type Classes for example DE to Lang DE.

module Main (main) where

import Data.Proxy (Proxy(..))

data DE
data EN

data LangCode
  = DE
  | EN
  deriving (Eq, Show)

type Lang a = Proxy a

de :: Lang DE
de = Proxy

en :: Lang EN
en = Proxy

class ToLangCode a where
  toLangCode :: Lang a -> LangCode

instance ToLangCode DE where
  toLangCode _ = DE
instance ToLangCode EN where
  toLangCode _ = EN

class FromLangCode a where
  fromLangCode :: LangCode -> Lang a

instance FromLangCode DE where
  fromLangCode DE = Proxy
instance FromLangCode EN where
  fromLangCode EN = Proxy

main :: IO ()
main = do
  print $ de -- Output => Proxy
  print $ en -- Output => Proxy

  print $ toLangCode de -- Output => DE
  print $ toLangCode en -- Output => EN

  -- works
  print $ (fromLangCode DE :: Lang DE) -- Output => Proxy
  print $ (fromLangCode EN :: Lang EN) -- Output => Proxy

  -- throws an error
  print $ fromLangCode DE -- Output => Proxy
  print $ fromLangCode EN -- Output => Proxy

With a type annotation it works fine. But without it I get this error.

[1 of 1] Compiling Main             ( main.hs, main.o )

main.hs:50:11: error:
    * Ambiguous type variable `a0' arising from a use of `fromLangCode'
      prevents the constraint `(FromLangCode a0)' from being solved.
      Probable fix: use a type annotation to specify what `a0' should be.
      These potential instances exist:
        instance FromLangCode DE -- Defined at main.hs:32:10
        instance FromLangCode EN -- Defined at main.hs:34:10
    * In the second argument of `($)', namely `fromLangCode DE'
      In a stmt of a 'do' block: print $ fromLangCode DE
      In the expression:
        do print $ de
           print $ en
           print $ toLangCode de
           print $ toLangCode en
           ....
   |
50 |   print $ fromLangCode DE -- Output => Proxy
   |           ^^^^^^^^^^^^^^^

main.hs:51:11: error:
    * Ambiguous type variable `a1' arising from a use of `fromLangCode'
      prevents the constraint `(FromLangCode a1)' from being solved.
      Probable fix: use a type annotation to specify what `a1' should be.
      These potential instances exist:
        instance FromLangCode DE -- Defined at main.hs:32:10
        instance FromLangCode EN -- Defined at main.hs:34:10
    * In the second argument of `($)', namely `fromLangCode EN'
      In a stmt of a 'do' block: print $ fromLangCode EN
      In the expression:
        do print $ de
           print $ en
           print $ toLangCode de
           print $ toLangCode en
           ....
   |
51 |   print $ fromLangCode EN -- Output => Proxy
   |           ^^^^^^^^^^^^^^^
exit status 1

My question is. Is it possible to implement it in a way so that the type annotations are no longer needed?

Updated version

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE FlexibleInstances #-}

module Main (main) where

data LangCode = DE | EN deriving (Eq, Show)

data SLangCode a where
    SDE :: SLangCode DE
    SEN :: SLangCode EN

data Lang (a :: LangCode) where
  LangDE :: Lang 'DE
  LangEN :: Lang 'EN

deriving instance Show (Lang 'DE)
deriving instance Show (Lang 'EN)

slcDE :: SLangCode a -> Lang 'DE -> Lang a
slcDE SDE t = t
slcDE SEN _ = LangEN

slcEN :: SLangCode a -> Lang 'EN -> Lang a
slcEN SEN t = t
slcEN SDE _ = LangDE

class ToLangCode a where
  toLangCode :: Lang a -> LangCode

instance ToLangCode 'DE where
  toLangCode _ = DE
instance ToLangCode 'EN where
  toLangCode _ = EN

class FromLangCode a where
  fromLangCode :: SLangCode a -> LangCode -> Lang a

instance FromLangCode 'DE where
  fromLangCode SDE DE = LangDE
instance FromLangCode 'EN where
  fromLangCode SEN EN = LangEN

main :: IO ()
main = do
  print $ toLangCode LangDE -- Output => DE
  print $ toLangCode LangEN -- Output => EN

  print $ fromLangCode SDE DE -- Output => LangDE
  print $ fromLangCode SEN EN -- Output => LangEN

Solution

  • Here's the problem. It's a fundamental property of types that if the term-level expressions e1 and e2 have the same type t, then substituting e1 with e2 in a program doesn't change any of the program's types. That's what makes them types.

    You want the expression (with no explicit type signature):

    fromLangCode EN
    

    to have type Lang EN, and that's easy enough. But if the term-level expressions EN and DE are constructors from the same sum type, LangCode, then substituting one for the other won't change any types, so the expression:

    fromLangCode DE
    

    will still have type Lang EN, which is clearly not what you want.

    So, if you want two different inferred types:

    fromLangCode EN :: Lang EN
    fromLangCode DE :: Lang DE
    

    then any solution is going to require that the term-level expressions EN and DE have different types, which means that you can't have:

    data LangCode = EN | DE
    

    So, that's the short answer -- you can't freely convert between a sum type LangCode and proxies of type Lang lang. Or rather, you can easily convert from type Lang lang to term LangCode using a type class (like ToLangCode), but you can't really convert back.

    There are many solutions to this "problem", but it kind of depends on what you're trying to do, which is why folks in the comments are asking you questions about "use cases" and "expected behavior".

    A Simple Solution that Accomplishes Nothing

    One simple solution is to write:

    data EN = EN
    data DE = DE
    

    Here, the term-level expressions EN and DE have different types (EN and DE respectively). This allows you to easily implement the desired interface using your main function verbatim:

    import Data.Proxy
    
    data EN = EN deriving (Show)
    data DE = DE deriving (Show)
    
    type Lang a = Proxy a
    
    de :: Lang DE
    de = Proxy
    
    en :: Lang EN
    en = Proxy
    
    class ToLangCode a where
      toLangCode :: Lang a -> a
    
    instance ToLangCode DE where
      toLangCode _ = DE
    instance ToLangCode EN where
      toLangCode _ = EN
    
    class FromLangCode a where
      fromLangCode :: a -> Lang a
    
    instance FromLangCode DE where
      fromLangCode DE = Proxy
    instance FromLangCode EN where
      fromLangCode EN = Proxy
    
    main :: IO ()
    main = do
      print $ de -- Output => Proxy
      print $ en -- Output => Proxy
    
      print $ toLangCode de -- Output => DE
      print $ toLangCode en -- Output => EN
    
      -- works
      print $ (fromLangCode DE :: Lang DE) -- Output => Proxy
      print $ (fromLangCode EN :: Lang EN) -- Output => Proxy
    
      -- works fine now
      print $ fromLangCode DE -- Output => Proxy
      print $ fromLangCode EN -- Output => Proxy
    

    If you're suspicious of this "solution", you're right to be. It doesn't really accomplish anything, since the term-level expressions EN and DE are already distinct at the type level, and this program is really just converting between one type level representation (types EN and DE) and another (types Lang EN and Lang DE).

    Using a GADT

    One way to kind of do what you want is to use a GADT. If we define LangCode as a "generalized" sum type:

    {-# LANGUAGE GADTs #-}
    
    data EN
    data DE
    
    data LangCode lang where
      EN :: LangCode EN
      DE :: LangCode DE
    

    everything works more or less like my previous example, with a few minor changes to type signatures, and main left unchanged, as below:

    {-# LANGUAGE GADTs #-}
    {-# LANGUAGE StandaloneDeriving #-}
    
    import Data.Proxy
    
    data EN
    data DE
    
    data LangCode lang where
      EN :: LangCode EN
      DE :: LangCode DE
    deriving instance Show (LangCode a)
    
    type Lang a = Proxy a
    
    de :: Lang DE
    de = Proxy
    
    en :: Lang EN
    en = Proxy
    
    class ToLangCode a where
      toLangCode :: Lang a -> LangCode a
    
    instance ToLangCode DE where
      toLangCode _ = DE
    instance ToLangCode EN where
      toLangCode _ = EN
    
    class FromLangCode a where
      fromLangCode :: LangCode a -> Lang a
    
    instance FromLangCode DE where
      fromLangCode DE = Proxy
    instance FromLangCode EN where
      fromLangCode EN = Proxy
    
    main :: IO ()
    main = do
      print $ de -- Output => Proxy
      print $ en -- Output => Proxy
    
      print $ toLangCode de -- Output => DE
      print $ toLangCode en -- Output => EN
    
      -- works
      print $ (fromLangCode DE :: Lang DE) -- Output => Proxy
      print $ (fromLangCode EN :: Lang EN) -- Output => Proxy
    
      -- works fine now
      print $ fromLangCode DE -- Output => Proxy
      print $ fromLangCode EN -- Output => Proxy
    

    So, we can freely convert between this generalized sum type and a phantom type representation.

    This really isn't a big improvement over the previous example. The two constructors are now formally part of a generalized sum type, but the term-level expressions EN and DE are already distinct at the type level, and we're just converting between one type-level representation (types LangCode EN and LangCode DE) and another (types Lang EN and Lang DE).

    However, the same criticism could be levelled at your "updated example". By introducing singletons (a generalized sum type), you too are already making the expressions fromLangCode SDE DE and fromLangCode SEN EN distinct at the type-level in that first, singleton argument. The second, term-level argument plays no useful role here and could be eliminated, so you're just converting from one type-level representation (SLangCode DE versus SLangCode EN) to another (Lang DE versus Lang EN).

    Existental Types

    Actual useful conversion between term and type-level representations usually involves an existential type somewhere in the mix. It helps to consider a slightly more realistic example. Suppose you might want to be able to use the type system to help avoid inappropriately mixing languages. For example:

    import Data.List.Extra
    
    data EN
    data DE
    newtype Text lang = Text String deriving (Show)
    
    item :: Text EN
    item = Text "The big elephant"
    
    artikel :: Text DE
    artikel = Text "Die großen Elefanten"
    
    fixß :: Text DE -> Text DE
    fixß (Text x) = Text $ replace "ß" "ss" x
    
    pluralize :: Text EN -> Text EN
    pluralize (Text noun) | "s" `isSuffixOf` noun = Text $ noun ++ "ses"
                          | otherwise = Text $ noun ++ "s"
    
    message_en :: Text EN
    message_en = pluralize item
    
    message_de :: Text DE
    message_de = fixß artikel
    
    -- type system prevents applying german manipulations to english text
    type_error_1 = fixß item
    

    But, you might also like to make a run-time decision about which language is being used in a particular expression:

    data LangCode = EN | DE
    
    main :: IO ()
    main = do
      let language = EN  -- assume this comes from args or user input
      -- type error: `mytext` can't be both `Text EN` and `Text DE`
      let mytext = case language of
            EN -> message_en
            DE -> message_de
      print mytext
    

    This doesn't work because the type of mytext can't depend on a runtime computation. That is, there's no simple way to convert the runtime term-level value language :: LangCode (a sum type) to a type-level value Lang language, the desired type of mytext.

    The usual solution in is to use an existential type:

    {-# LANGUAGE ExistentialQuantification #-}
    data SomeText = forall lang. SomeText (Text lang)
    

    Here, the type SomeText represents text in some unspecified language (i.e., a value of type Text lang for some unspecified type lang). Now, mytext can be assigned text in a runtime-determined language by wrapping it with the SomeText constructor.

    let mytext = case language of
          EN -> SomeText message_en
          DE -> SomeText message_de
    

    We are limited in what we can do with a SomeText value like mytext -- we can't do anything that depends on knowing the language, like applying pluralize or fixß or whatever. However, one thing we can do is extract the string, since that works for any language:

    getText :: SomeText -> String
    getText (SomeText (Text str)) = str
    

    which allows us to write a useful main:

    main :: IO ()
    main = do
      let language = EN
      let mytext = case language of
            EN -> SomeText message_en
            DE -> SomeText message_de
      print $ getText mytext
    

    Here's the full working example:

    {-# LANGUAGE ExistentialQuantification #-}
    
    import Data.List.Extra
    
    data EN
    data DE
    data LangCode = EN | DE
    newtype Text lang = Text String deriving (Show)
    
    data SomeText = forall lang. SomeText (Text lang)
    
    item :: Text EN
    item = Text "The big elephant"
    
    artikel :: Text DE
    artikel = Text "Die großen Elefanten"
    
    fixß :: Text DE -> Text DE
    fixß (Text x) = Text $ replace "ß" "ss" x
    
    pluralize :: Text EN -> Text EN
    pluralize (Text noun) | "s" `isSuffixOf` noun = Text $ noun ++ "ses"
                          | otherwise = Text $ noun ++ "s"
    
    message_en :: Text EN
    message_en = pluralize item
    
    message_de :: Text DE
    message_de = fixß artikel
    
    getText :: SomeText -> String
    getText (SomeText (Text str)) = str
    
    main :: IO ()
    main = do
      let language = DE
      let mytext = case language of
            EN -> SomeText message_en
            DE -> SomeText message_de
      print $ getText mytext
    

    What we've done here is successfully converted a term-level value from a sum type (language) to a type-level value Text language by wrapping it in a SomeText constructor.

    Applied to Your Example

    We can use the same technique to convert freely between a sum type and type-level proxies, by shielding the type-level proxies in an existential. It might look something like this. Note how I've used a custom Show instance to differentiate proxies of different types, to prove we're doing something useful.

    {-# LANGUAGE ExistentialQuantification #-}
    
    import Data.Proxy
    
    data DE
    data EN
    
    -- sum type
    data LangCode
      = DE
      | EN
      deriving (Eq, Show)
    
    -- existential for type-level proxies
    type Lang = Proxy
    data SomeLang = forall a. ToLangCode a => SomeLang (Lang a)
    instance Show SomeLang where
      show (SomeLang lang) = "SomeLang Proxy<" ++ show (toLangCode' lang) ++ ">"
    
    -- convert from LangCode to SomeLang
    fromLangCode :: LangCode -> SomeLang
    fromLangCode EN = SomeLang (Proxy :: Lang EN)
    fromLangCode DE = SomeLang (Proxy :: Lang DE)
    
    -- convert from SomeLang to LangCode
    class ToLangCode lang where
      toLangCode' :: Lang lang -> LangCode
    instance ToLangCode EN where
      toLangCode' Proxy = EN
    instance ToLangCode DE where
      toLangCode' Proxy = DE
    toLangCode :: SomeLang -> LangCode
    toLangCode (SomeLang lang) = toLangCode' lang
    
    de :: SomeLang
    de = SomeLang (Proxy :: Lang DE)
    
    en :: SomeLang
    en = SomeLang (Proxy :: Lang EN)
    
    main :: IO ()
    main = do
      print $ de -- Output => SomeLang Proxy<DE>
      print $ en -- Output => SomeLang Proxy<EN>
    
      print $ toLangCode de -- Output => DE
      print $ toLangCode en -- Output => EN
    
      print $ fromLangCode DE -- Output => SomeLang Proxy<DE>
      print $ fromLangCode EN -- Output => SomeLang Proxy<EN>