haskelltypespowerset

Patterns for managing/representing selection from a set of options in Haskell


Algebraic data types make it easy to allow one item from a set to be selected: just use an appropriate sum type.

I'm curious how one would implement the option of selecting strictly n items or fewer from a set. I can see how it might be possible to implement this with Data.Set, but I wonder if there's a pattern or an algebraic structure that would be more robust.

Say a burger can have three toppings from the set Pickle | Onion | Lettuce | Tomato etc. It's reasonable to want to display all the options in a UI, so if we use Data.Set, right away I notice that Data.Set.all is missing, so there's no easy way to print all the values that can be selected. But Data.Set.powerSet is available, so I'm thinking maybe I should be getting the user to choose an element of the subset of the power set with cardinality <= 3.

Using an element of the power set to represent a selection from the base set seems like a good idea. It doesn't look much like any UI I can think of, though. The power set functor is a monad, though I'm not sure if this is relevant (See discussion about power set as a functor here Sets, Functors and Eq confusion).

Maybe someone who has solved a similar problem before will have advice on how to best make this work. I'm really looking for conceptual insights into what an "n choose k" type looks like, if that makes sense.


Solution

  • Here's an answer, of sorts.

    Here are a few extensions and imports needed for the rest of the code below.

    {-# LANGUAGE DataKinds #-}
    {-# LANGUAGE GADTs #-}
    {-# LANGUAGE KindSignatures #-}
    {-# LANGUAGE PolyKinds #-}
    {-# LANGUAGE ScopedTypeVariables #-}
    {-# LANGUAGE StandaloneDeriving #-}
    {-# LANGUAGE TypeApplications #-}
    {-# LANGUAGE TypeFamilies #-}
    {-# LANGUAGE TypeOperators #-}
    {-# LANGUAGE UndecidableInstances #-}
    
    import GHC.TypeNats
    import Data.Proxy
    import Data.Set (Set)
    import qualified Data.Set as Set
    

    As you're probably aware, the nice thing about representing a single item as an ADT:

    data Topping = Pickle | Onion | Lettuce | Tomato deriving (Show, Eq, Ord, Enum, Bounded)
    

    is that the definition of the type makes it impossible to represent anything other than exactly one choice of Topping, provided we ignore the possibility of having an undefined value. It's equally easy to represent "up to one choice of Topping" in this foolproof way, using a Maybe type:

    type UpToOneTopping = Maybe Topping
    

    We can also represent zero or more (including a possibly infinite number of) toppings, with possible duplicates, using a list:

    type ManyToppings = [Topping]
    

    However, it's difficult to represent a set of toppings without duplicates this same way. We can use the Set Topping type, but we're no longer relying on the type system to make sets with duplicates unrepresentable. Instead, we're relying on the implementation of the Set type in the containers package, and the restricted interface it provides, to guarantee that the set won't contain any duplicates. If we dug into the guts of the Set implementation, we'd probably discover that the internal representation can allow sets with duplicates, and we're relying on the correctness of the code written by the Set author to make sure this can't happen.

    In contrast, the Topping type above makes a stronger guarantee than this, making anything other than a single type unrepresentable by construction. (Well, I guess you could say we're relying on the correctness of the compiler, but it's common to think of this kind of guarantee as "stronger" than one that relies on the correctness of the Set implementation.)

    In a similar vein to the Set type, it would be possible to use an UpToThree Topping type to represent a set of zero to three distinct toppings, but the guarantee would be provided by the implementation and its sanctioned interface, not by the data representation itself. That is, we could write:

    newtype UpToThree a = UpToThree (Set a)
    
    upToThree :: (Ord a) => [a] -> UpToThree a
    upToThree xs | length xs' > 3 = error "no more than three allowed"
                 | otherwise = UpToThree xs'
      where xs' = Set.fromList xs
    

    If we only create UpToThree values via this upToThree "smart constructor" (either through programmer discipline or using Haskell's package system to limit the interface exported to users of the data type), then we can provide a guarantee of sorts that any UpToThree value will represent no more than three distinct a values.

    In practical Haskell code, this is probably the most reasonable approach. One limitation is that it requires a separate new type for each up-to-k number. We could make the number a field in the data type:

    data UpToK a = UpToK Int (Set a)
    
    upToK :: (Ord a) => Int -> [a] -> UpToK a
    upToK k xs | length xs' > k = error "too many"
                 | otherwise = UpToK k xs'
      where xs' = Set.fromList xs
    

    but this UpToK type is much less useful than it might first appear. The problem is that the smart constructor ensures that a particular UpToK represents a number of distinct values of type a that's no more than than the count given by the integer field k encoded in the value, so it ensures consistency between the two fields in the value, but the type system doesn't enforce anything about k, so if we have a function that requires no more than three toppings using the original UpToThree:

    data Burger = Burger
    placeOnBurger :: UpToThree Topping -> Burger
    placeOnBurger u = undefined
    

    we can ensure through a type signature using UpToThree that we receive no more than three toppings, provided the implementation of the UpToThree type is sound, but if we try to write an UpToK version, we end up having to inspect the field anyway to ensure our precondition is met:

    placeOnBurger' :: UpToK Topping -> Burger
    placeOnBurger' (UpToK k _) | k > 3 = error "no one can fit so many toppings!"
    placeOnBurger' u = undefined
    

    which means we could have just checked the length of a Set directly and didn't need the UpToK type at all:

    placeOnBurger'' :: Set Topping -> Burger
    placeOnBurger'' s | length s > 3 = error "too many toppings for a burger"
    placeOnBurger'' s = undefined
    

    Fortunately, there is a way to use the DataKinds extension, to index a collection of types by a non-negative number. With a few other extensions, this allows us to write:

    newtype UpTo (k :: Nat) a = UpTo (Set a)
    upTo :: forall k a. (KnownNat k, Ord a) => [a] -> UpTo k a
    upTo xs | length xs' > fromIntegral (natVal (Proxy @k)) = error "too many"
            | otherwise = UpTo xs'
      where xs' = Set.fromList xs
    

    This UpTo type is much more useful than UpToK. It still relies on a protected upTo smart constructor to ensure that only valid UpTo values are produced, but the upper limit is now part of the type, so we can write:

    placeOnBurger''' :: UpTo 3 Topping -> Burger
    placeOnBurger''' s = undefined
    

    and we have a strong, type system guarantee that this version of placeOnBurger''' will only be called with an UpTo 3 Topping value, which implies a (weaker) smart constructor guarantee that the number of toppings passed will be no more than three.

    If you want the stronger guarantee of encoding the topping limit in the type so that it's enforced by the type system itself, the same way Maybe Topping enforces a zero-or-one topping rule, then it's possible, but it can quickly get unwieldy.

    One method is to use a GADT in combination with a Peano natural that's automatically promoted to the type level by the DataKinds extension:

    data Peano = Z | S Peano
    

    It's also helpful to have a type-level function for converting from plain type-level naturals to these Peano type-level naturals:

    type family P n where
      P 0 = Z
      P n = S (P (n-1))
    

    Then we can easily create a GADT representing a list of "no more than n" elements:

    -- a list of no more than `n` elements of type `a`
    data LimitedList (n :: Peano) a where
      Empty :: LimitedList n a
      Cons :: a -> LimitedList n a -> LimitedList (S n) a
    infixr 5 `Cons`
    deriving instance (Show a) => Show (LimitedList n a)
    

    It takes a little study to understand why this representation works, but observe what happens in GHCi when we try to create a 3-element list whose type-level limits are 3, 5, and 2:

    λ> 1 `Cons` 2 `Cons` 3 `Cons` Empty :: LimitedList (P 3) Int
    ...works...
    λ> 1 `Cons` 2 `Cons` 3 `Cons` Empty :: LimitedList (P 5) Int
    ...works...
    λ> 1 `Cons` 2 `Cons` 3 `Cons` Empty :: LimitedList (P 2) Int
    ...generates a type error...
    

    The downside of this GADT is that it allows duplicates and doesn't provide a unique representation, so if we had multiple representations of pickles and onions as a no-more-than-3 list of toppings:

    top1 = Pickle `Cons` Onion `Cons` Empty :: LimitedList (P 3) Topping
    top2 = Onion `Cons` Pickle `Cons` Empty :: LimitedList (P 3) Topping
    top3 = Onion `Cons` Pickle `Cons` Onion `Cons` Empty :: LimitedList (P 3) Topping
    

    but wanted them to be treated uniformly (compare as equal, generate the same list of toppings in the same order when we run toList on them, etc.), we'd end up having to write some more code to provide these (weak) guarantees.

    To answer another part of your question, it's possible to write a function to generate all valid LimitedLists of a certain size for any enumerable type, though the output will include these duplicates and alternative representations. Writing such a function requires the use of a so-called "singleton" representation for the Peano naturals:

    data SPeano n where
      SZ :: SPeano Z
      SS :: SPeano n -> SPeano (S n)
    
    allLimitedLists :: (Enum a, Bounded a) => SPeano n -> [LimitedList n a]
    allLimitedLists SZ = [Empty]
    allLimitedLists (SS n) = [x `Cons` xs | x <- [minBound..maxBound]
                               , xs <- rest]
                             ++ map levelUp rest
      where rest = allLimitedLists n
            levelUp :: LimitedList n a -> LimitedList (S n) a
            levelUp Empty = Empty
            levelUp (x `Cons` xs) = x `Cons` levelUp xs
    
    all3Toppings :: [LimitedList (P 3) Topping]
    all3Toppings = allLimitedLists (SS (SS (SS SZ)))
    
    main = print all3Toppings
    

    The rather strange levelUp function here is an expensive identity function, at least at the term level. (At the type level, it's not the identity, which is kind of the point.) Anyway, it's an unfortunate artifact of the choice of representation of LimitedList. Ironically, it can be safely replaced with levelUp = unsafeCoerce. At least, I think it can. It seems to work, at any rate.

    Before I give one last approach, here's all the code so far in case you want to play with it:

    {-# LANGUAGE DataKinds #-}
    {-# LANGUAGE GADTs #-}
    {-# LANGUAGE KindSignatures #-}
    {-# LANGUAGE PolyKinds #-}
    {-# LANGUAGE ScopedTypeVariables #-}
    {-# LANGUAGE StandaloneDeriving #-}
    {-# LANGUAGE TypeApplications #-}
    {-# LANGUAGE TypeFamilies #-}
    {-# LANGUAGE TypeOperators #-}
    {-# LANGUAGE UndecidableInstances #-}
    
    import GHC.TypeNats
    import Data.Proxy
    import Data.Set (Set)
    import qualified Data.Set as Set
    
    data Topping = Pickle | Onion | Lettuce | Tomato deriving (Show, Eq, Ord, Enum, Bounded)
    
    type UpToOneTopping = Maybe Topping
    
    type ManyToppings = [Topping]
    
    newtype UpToThree a = UpToThree (Set a)
    
    upToThree :: (Ord a) => [a] -> UpToThree a
    upToThree xs | length xs' > 3 = error "no more than three allowed"
                 | otherwise = UpToThree xs'
      where xs' = Set.fromList xs
    
    data UpToK a = UpToK Int (Set a)
    
    upToK :: (Ord a) => Int -> [a] -> UpToK a
    upToK k xs | length xs' > k = error "too many"
                 | otherwise = UpToK k xs'
      where xs' = Set.fromList xs
    
    data Burger = Burger
    placeOnBurger :: UpToThree Topping -> Burger
    placeOnBurger u = undefined
    
    placeOnBurger' :: UpToK Topping -> Burger
    placeOnBurger' (UpToK k _) | k > 3 = error "no one can fit so many toppings!"
    placeOnBurger' u = undefined
    
    placeOnBurger'' :: Set Topping -> Burger
    placeOnBurger'' s | length s > 3 = error "too many toppings for a burger"
    placeOnBurger'' s = undefined
    
    newtype UpTo (k :: Nat) a = UpTo (Set a)
    upTo :: forall k a. (KnownNat k, Ord a) => [a] -> UpTo k a
    upTo xs | length xs' > fromIntegral (natVal (Proxy @k)) = error "too many"
            | otherwise = UpTo xs'
      where xs' = Set.fromList xs
    
    placeOnBurger''' :: UpTo 3 Topping -> Burger
    placeOnBurger''' s = undefined
    
    data Peano = Z | S Peano
    
    type family P n where
      P 0 = Z
      P n = S (P (n-1))
    
    -- a list of no more than `n` elements of type `a`
    data LimitedList (n :: Peano) a where
      Empty :: LimitedList n a
      Cons :: a -> LimitedList n a -> LimitedList (S n) a
    infixr 5 `Cons`
    deriving instance (Show a) => Show (LimitedList n a)
    
    top1 = Pickle `Cons` Onion `Cons` Empty :: LimitedList (P 3) Topping
    top2 = Onion `Cons` Pickle `Cons` Empty :: LimitedList (P 3) Topping
    top3 = Onion `Cons` Pickle `Cons` Onion `Cons` Empty :: LimitedList (P 3) Topping
    
    data SPeano n where
      SZ :: SPeano Z
      SS :: SPeano n -> SPeano (S n)
    
    allLimitedLists :: (Enum a, Bounded a) => SPeano n -> [LimitedList n a]
    allLimitedLists SZ = [Empty]
    allLimitedLists (SS n) = [x `Cons` xs | x <- [minBound..maxBound]
                               , xs <- rest]
                             ++ map levelUp rest
      where rest = allLimitedLists n
            levelUp :: LimitedList n a -> LimitedList (S n) a
            levelUp Empty = Empty
            levelUp (x `Cons` xs) = x `Cons` levelUp xs
    
    all3Toppings :: [LimitedList (P 3) Topping]
    all3Toppings = allLimitedLists (SS (SS (SS SZ)))
    
    main = print all3Toppings
    

    Finally, it is possible to enforce a unique, duplicate-free representation at the type level, but it adds another layer of complexity and makes the type much more difficult to use. Here's one way to do it by encoding the minimum element in the type and using a constraint to permit only smaller elements to be prepended. I've used the singletons package to take care of some of the boilerplate, so we'll require quite a few extensions plus some imports:

    {-# LANGUAGE DataKinds #-}
    {-# LANGUAGE EmptyCase #-}
    {-# LANGUAGE FlexibleInstances #-}
    {-# LANGUAGE GADTs #-}
    {-# LANGUAGE InstanceSigs #-}
    {-# LANGUAGE KindSignatures #-}
    {-# LANGUAGE PolyKinds #-}
    {-# LANGUAGE ScopedTypeVariables #-}
    {-# LANGUAGE StandaloneDeriving #-}
    {-# LANGUAGE TemplateHaskell #-}
    {-# LANGUAGE TypeApplications #-}
    {-# LANGUAGE TypeFamilies #-}
    {-# LANGUAGE TypeOperators #-}
    {-# LANGUAGE TypeSynonymInstances #-}
    {-# LANGUAGE UndecidableInstances #-}
    
    import Data.Singletons
    import Data.Singletons.Prelude
    import Data.Singletons.TH
    

    We'll need singletons for both our Peano naturals and our Topping type, as we'll see below.

    $(singletons [d|
      data Peano = Z | S Peano
      data Topping = Pickle | Onion | Lettuce | Tomato deriving (Show, Eq, Ord, Enum, Bounded)
      |])
    

    Here, we use promote to define type functions P and Before. These could have been defined using type families, but promoting plain Haskell functions gives nicer syntax.

    $(promote [d|
      p 0 = Z
      p n = S (p (n-1))
    
      before _ Nothing = True
      before x (Just y) = x < y
      |])
    

    Our limited set will be represented by the following GADT, which represents a (strictly) ascending list of elements of type a whose head (smallest) element is h, and which contains no more than n elements. We use the Before type function here to permit an x value to be prepended to a list only if it's strictly less than the list's head element.

    data MinLimitedSet (n :: Peano) (h :: Maybe a) where
      Empty' :: MinLimitedSet n Nothing
      Cons' :: (Before x min ~ True) => Sing x -> MinLimitedSet n min -> MinLimitedSet (S n) (Just x)
    infixr 5 `Cons'`
    

    Using this MinLimitedSet type in code would require specifying the minimum element of the list in the type signatures. That's not very convenient, so we provide an existential type LimitedSet that represents a MinLimitedSet with an unspecified minimum element:

    data LimitedSet (n :: Peano) a where
      LimitedSet :: MinLimitedSet n (h :: Maybe a) -> LimitedSet n a
    

    With these GADTs in place, we can define an example value:

    top1' :: LimitedSet (P 3) Topping
    top1' = LimitedSet $ sing @Pickle `Cons'` sing @Onion `Cons'` Empty'
    

    Note that this definition is valid only because Pickle and Onion are in strictly ascending order (according to the Ord instance for Topping) and there are no more than three elements being combined. If you try to create a list of too many toppings, or a list with duplicates or out of strictly ascending order, you'll get a compile-time type error. To use such lists in programs, you'll typically want to convert them to plain term-level lists, and that's what the toList function does.

    toList :: forall n a. (SingKind a) => LimitedSet n a -> [Demote a]
    toList (LimitedSet asc) = go asc
      where go :: forall n lst. MinLimitedSet n lst -> [Demote a]
            go (Empty') = []
            go (x `Cons'` xs) = fromSing x : go xs
    
    main = print $ toList top1'
    

    The full code:

    {-# LANGUAGE DataKinds #-}
    {-# LANGUAGE EmptyCase #-}
    {-# LANGUAGE FlexibleInstances #-}
    {-# LANGUAGE GADTs #-}
    {-# LANGUAGE InstanceSigs #-}
    {-# LANGUAGE KindSignatures #-}
    {-# LANGUAGE PolyKinds #-}
    {-# LANGUAGE ScopedTypeVariables #-}
    {-# LANGUAGE StandaloneDeriving #-}
    {-# LANGUAGE TemplateHaskell #-}
    {-# LANGUAGE TypeApplications #-}
    {-# LANGUAGE TypeFamilies #-}
    {-# LANGUAGE TypeOperators #-}
    {-# LANGUAGE TypeSynonymInstances #-}
    {-# LANGUAGE UndecidableInstances #-}
    
    import Data.Singletons
    import Data.Singletons.Prelude
    import Data.Singletons.TH
    
    $(singletons [d|
      data Peano = Z | S Peano
      data Topping = Pickle | Onion | Lettuce | Tomato deriving (Show, Eq, Ord, Enum, Bounded)
      |])
    
    $(promote [d|
      p 0 = Z
      p n = S (p (n-1))
    
      before _ Nothing = True
      before x (Just y) = x < y
      |])
    
    data MinLimitedSet (n :: Peano) (h :: Maybe a) where
      Empty' :: MinLimitedSet n Nothing
      Cons' :: (Before x min ~ True) => Sing x -> MinLimitedSet n min -> MinLimitedSet (S n) (Just x)
    infixr 5 `Cons'`
    data LimitedSet (n :: Peano) a where
      LimitedSet :: MinLimitedSet n (h :: Maybe a) -> LimitedSet n a
    
    top1' :: LimitedSet (P 3) Topping
    top1' = LimitedSet $ sing @Pickle `Cons'` sing @Onion `Cons'` Empty'
    
    toList :: forall n a. (SingKind a) => LimitedSet n a -> [Demote a]
    toList (LimitedSet asc) = go asc
      where go :: forall n lst. MinLimitedSet n lst -> [Demote a]
            go (Empty') = []
            go (x `Cons'` xs) = fromSing x : go xs
    
    main = print $ toList top1'