I am working with a custom data type MSet
that represents a multiset. The multiset is defined as:
data MSet a = MSet [(a, Int)]
Where each tuple (a, Int) represents an element a and its multiplicity (a positive integer). The MSet
type mantains an invariant:
I was tasked with defining a function, mapMSet
:
mapMSet :: (a -> b) -> MSet a -> MSet b
This function applies a given function f :: a -> b
to all elements of the MSet
. However, if f
maps two distinct elements to the same value, duplicates will be created, and the multiplicities must be merged to maintain the MSet
invariant.
For example, given:
ms = MSet[(1,2),(2,3)]
f x = 1
The result of mapMSet f ms
would be:
MSet [(1,5)]
I understand that the Functor
type class requires the implementation of fmap:
fmap :: (a -> b) -> f a -> f b
Moreover, fmap
must satisfy the Functor Laws:
I'm struggling to understand why it is not possible to define an instance of Functor
for MSet
using mapMSet
as fmap
. I know the issue is related to maintaining the MSet
invariant, but i'm not entirely clear on how this prevents MSet
from satisfying the laws of Functor.
Could someone explain why MSet cannot be an instance of Functor? Specifically:
Functor
laws?mapMSet
?Functor
?To determine whether MSet
can be a valid Functor
instance, I believe we need to analyze two specific cases:
mapMSet
includes a subroutine that, after applying the function f
to each element, merges all pairs (x, n)
and (x, n')
into a single pair (x, n + n')
.
mapMSet
does not perform this merging functionality.
mapMSet
merges duplicatesIn this scenario, the Functor law for identity might be violated. Consider the following example:
ms = MSet [(1, 1), (1, 2)]
If we apply fmap id
to ms
fmap id ms
= mapMSet id ms
= MSet [(1,3)] -- After merging duplicates
In this case:
ms != fmap id ms
Here, the identity law is violated because fmap id
does not leave the multiset unchanged. However, the resulting MSet
is well-formed cause the merging ensures that the invariant of the data type is maintained.
mapMSet
does not merge duplicatesIn this scenario, both the identity and composition laws hold because mapMSet
behaves like a standard fmap
without altering the structure of the multiset beyond applying the function f
.
However, the resulting MSet
may not respect its invariant.
For instance, consider applying a composition where one of the functions is a constat:
f x = 1 -- Maps all elements to 1
g x = id -- Identity
If we apply fmap (f . g)
:
fmap (f . g) ms
= mapMSet (f . g) ms
= MSet [(1,1),(1,2)] -- Duplicates!
The resulting MSet
is not well-formed, but the composition law holds:
fmap (f . g) ms == fmap f (fmap g ms)
Any clarification would be greatly appreciated!
How does maintaining the invariant (merging duplicates) interfere with the
Functor
laws?
Here we must draw a distinction between functors and Functor
s. It is perfectly possible to define a functor which merges duplicates without violating the functor laws; but not all functors are Functor
s. The issue is that Functor
describes specifically functors whose source category and target category are both Hask, the category with Haskell types as objects and Haskell functions as arrows. (If you want to be very careful about this definition, you run into some issues around composition laws that aren't super germane to the current discussion.)
The important problem here is that the source category is Hask; that is, fmap
must be able to handle all functions that unify with a -> b
, even when b
is not an instance of Eq
. Without an Eq
instance, it is not possible for fmap
to merge duplicates. A standard example here is to look at a function type with an infinite domain; for example, merging duplicates inside the implementation of fmap :: (a -> Integer -> Bool) -> MSet a -> MSet (Integer -> Bool)
is not computable.
On the other hand, if you allow yourself to talk about functors more generally, you can identify subcategories of Hask; for example you might consider the source category Hask/Eq whose objects are instances of Eq
and whose arrows are equality-preserving functions, and the target category Hask/MSet whose objects are MSet
s and whose arrows are invariant-preserving functions. Then there is no problem implementing fmapEq :: (Eq a, Eq b) => (a -> b) -> MSet a -> MSet b
and merging duplicates inside the implementation -- in fact it would be required to merge duplicates to call this a functor to the target category.
There are several libraries on Hackage that provide classes for more generalized functors, but experience shows the cognitive overhead of using them is rarely paid back. (Runtime overhead can usually be made negligible with appropriate newtype
shenanigans.)
What happens if i decide not to merge duplicates in
mapMSet
?
Nothing much. You even get a perfectly fine Functor
. You just don't get to assume that your structural invariant holds when handed an MSet
, which is a serious efficiency bummer and may well defeat the entire purpose of having the type in the first place in some situations.
Can a data type with structural invariants ever be a valid instance of
Functor
?
Yes. Besides the commentary above discussing more general functors, you can have the more specific Functor
when the structural invariants only involve types other than the one being fmap
'd. A typical example is dictionaries, implemented as balanced trees, which have the instance
instance Ord k => Functor (Map k)
and whose fmap
implementation maintains the appropriate structural invariant.
There is also the free functor construction that turns any type of the appropriate kind into a Functor
:
data FreeF f a where FreeF :: (a -> b) -> f a -> FreeF f b
instance Functor (FreeF f) where
fmap f (FreeF g as) = FreeF (f . g) as
If you use this, you would then want to define an observation function with a type like
observe :: Eq a => FreeF MSet a -> MSet a
observe (FreeF f as) = mergeDuplicates [(f a, n) | (a, n) <- as]
The unfortunate property of this is that merging must be done at FreeF MSet
observation time; multiple observations perform the merging multiple times. Sometimes this is okay, sometimes it kills performance dead. (In particular, if you begin writing a Monad
instance with the analogous trick, you quickly find that merging at every bind can be an exponential speedup in many very realistic cases.)
Practically speaking, the FreeF
construction is essentially the one discussed in the previous section -- which doesn't perform merges in mapMSet
-- just with a type-level marker distinguishing values which satisfy the structural invariant from values that don't. Of course that marker can be handy in some cases, but FreeF
definitely isn't any sort of magic bullet.