Why can't MSet be an instance of Functor?

I am working with a custom data type MSet that represents a multiset. The multiset is defined as:

data MSet a = MSet [(a, Int)]

Where each tuple (a, Int) represents an element a and its multiplicity (a positive integer). The MSet type mantains an invariant:

There are no duplicate elements in the list (no two tuples have the same a).
When adding elements already present in the multiset, the multiplicities should be summed.

I was tasked with defining a function, mapMSet:

mapMSet :: (a -> b) -> MSet a -> MSet b

This function applies a given function f :: a -> b to all elements of the MSet. However, if f maps two distinct elements to the same value, duplicates will be created, and the multiplicities must be merged to maintain the MSet invariant.

For example, given:

ms = MSet[(1,2),(2,3)]
f x = 1

The result of mapMSet f ms would be:

MSet [(1,5)]

I understand that the Functor type class requires the implementation of fmap:

fmap :: (a -> b) -> f a -> f b

Moreover, fmap must satisfy the Functor Laws:

Identity
Composition

I'm struggling to understand why it is not possible to define an instance of Functor for MSet using mapMSet as fmap. I know the issue is related to maintaining the MSet invariant, but i'm not entirely clear on how this prevents MSet from satisfying the laws of Functor.

Could someone explain why MSet cannot be an instance of Functor? Specifically:

How does maintaining the invariant (merging duplicates) interfere with the Functor laws?
What happens if i decide not to merge duplicates in mapMSet?
Can a data type with structural invariants ever be a valid instance of Functor?

How i interpret this:

To determine whether MSet can be a valid Functor instance, I believe we need to analyze two specific cases:

mapMSet includes a subroutine that, after applying the function f to each element, merges all pairs (x, n) and (x, n') into a single pair (x, n + n').
mapMSet does not perform this merging functionality.

Case 1: `mapMSet` merges duplicates

In this scenario, the Functor law for identity might be violated. Consider the following example:

ms = MSet [(1, 1), (1, 2)]

If we apply fmap id to ms

fmap id ms
= mapMSet id ms
= MSet [(1,3)] -- After merging duplicates

In this case:

ms != fmap id ms

Here, the identity law is violated because fmap id does not leave the multiset unchanged. However, the resulting MSet is well-formed cause the merging ensures that the invariant of the data type is maintained.

Case 2: `mapMSet` does not merge duplicates

In this scenario, both the identity and composition laws hold because mapMSet behaves like a standard fmap without altering the structure of the multiset beyond applying the function f. However, the resulting MSet may not respect its invariant.

For instance, consider applying a composition where one of the functions is a constat:

f x = 1 -- Maps all elements to 1
g x = id -- Identity

If we apply fmap (f . g):

fmap (f . g) ms
= mapMSet (f . g) ms
= MSet [(1,1),(1,2)] -- Duplicates!

The resulting MSet is not well-formed, but the composition law holds:

fmap (f . g) ms == fmap f (fmap g ms)

Any clarification would be greatly appreciated!

Solution

How does maintaining the invariant (merging duplicates) interfere with the Functor laws?

Here we must draw a distinction between functors and Functors. It is perfectly possible to define a functor which merges duplicates without violating the functor laws; but not all functors are Functors. The issue is that Functor describes specifically functors whose source category and target category are both Hask, the category with Haskell types as objects and Haskell functions as arrows. (If you want to be very careful about this definition, you run into some issues around composition laws that aren't super germane to the current discussion.)

The important problem here is that the source category is Hask; that is, fmap must be able to handle all functions that unify with a -> b, even when b is not an instance of Eq. Without an Eq instance, it is not possible for fmap to merge duplicates. A standard example here is to look at a function type with an infinite domain; for example, merging duplicates inside the implementation of fmap :: (a -> Integer -> Bool) -> MSet a -> MSet (Integer -> Bool) is not computable.

On the other hand, if you allow yourself to talk about functors more generally, you can identify subcategories of Hask; for example you might consider the source category Hask/Eq whose objects are instances of Eq and whose arrows are equality-preserving functions, and the target category Hask/MSet whose objects are MSets and whose arrows are invariant-preserving functions. Then there is no problem implementing fmapEq :: (Eq a, Eq b) => (a -> b) -> MSet a -> MSet b and merging duplicates inside the implementation -- in fact it would be required to merge duplicates to call this a functor to the target category.

There are several libraries on Hackage that provide classes for more generalized functors, but experience shows the cognitive overhead of using them is rarely paid back. (Runtime overhead can usually be made negligible with appropriate newtype shenanigans.)

What happens if i decide not to merge duplicates in mapMSet?

Nothing much. You even get a perfectly fine Functor. You just don't get to assume that your structural invariant holds when handed an MSet, which is a serious efficiency bummer and may well defeat the entire purpose of having the type in the first place in some situations.

Can a data type with structural invariants ever be a valid instance of Functor?

Yes. Besides the commentary above discussing more general functors, you can have the more specific Functor when the structural invariants only involve types other than the one being fmap'd. A typical example is dictionaries, implemented as balanced trees, which have the instance

instance Ord k => Functor (Map k)

and whose fmap implementation maintains the appropriate structural invariant.

There is also the free functor construction that turns any type of the appropriate kind into a Functor:

data FreeF f a where FreeF :: (a -> b) -> f a -> FreeF f b
instance Functor (FreeF f) where
    fmap f (FreeF g as) = FreeF (f . g) as

If you use this, you would then want to define an observation function with a type like

observe :: Eq a => FreeF MSet a -> MSet a
observe (FreeF f as) = mergeDuplicates [(f a, n) | (a, n) <- as]

The unfortunate property of this is that merging must be done at FreeF MSet observation time; multiple observations perform the merging multiple times. Sometimes this is okay, sometimes it kills performance dead. (In particular, if you begin writing a Monad instance with the analogous trick, you quickly find that merging at every bind can be an exponential speedup in many very realistic cases.)

Practically speaking, the FreeF construction is essentially the one discussed in the previous section -- which doesn't perform merges in mapMSet -- just with a type-level marker distinguishing values which satisfy the structural invariant from values that don't. Of course that marker can be handy in some cases, but FreeF definitely isn't any sort of magic bullet.

Why can't MSet be an instance of Functor?

How i interpret this:

Case 1: mapMSet merges duplicates

Case 2: mapMSet does not merge duplicates

Case 1: `mapMSet` merges duplicates

Case 2: `mapMSet` does not merge duplicates