haskelltemplate-haskellstorableghc-genericsderivingvia

Would it be possible to derive Data.Vector.Unbox via GHC's generic deriving?


It's possible to derive Storable via GHC's generic deriving mechanism: http://hackage.haskell.org/package/derive-storable (and https://hackage.haskell.org/package/derive-storable-plugin for performance). The only library I can find for deriving Data.Vector.Unbox, however, uses template Haskell: http://hackage.haskell.org/package/vector-th-unbox. It also requires the user to write a little code; it's not entirely automatic.

My question is, could a library like deriving-storable also exist for Unbox, or is this not possible due to some fundamental way in which Unbox differs from Storable? If the latter, does that mean it's also not possible to create a library that allows automatically deriving Unbox for any Storable type, as I could not find such a library.

I ask because ideally I'd like to avoid template Haskell and the manual annotations necessary for using vector-th-unbox.


Solution

  • Say we had some Generic_ class to convert between our own types and some uniform representation which happens to have an Unbox instance (which amounts to both MVector and Vector instances for the Unboxed variants):

    class Generic_ a where
      type Rep_ (a :: Type) :: Type
      to_ :: a -> Rep_ a
      from_ :: Rep_ a -> a
    

    Then we can use that to obtain generic implementations of the methods of MVector/Vector:

    -- (auxiliary definitions of CMV and uncoercemv at the end of this block)
    -- vector imports (see gist at the end for a compilable sample)
    import qualified Data.Vector.Unboxed as U
    import qualified Data.Vector.Unboxed.Mutable as UM
    import Data.Vector.Generic.Mutable.Base (MVector(..))
    
    
    
    -- MVector
    
    gbasicLength :: forall a s. CMV s a => UM.MVector s a -> Int
    gbasicLength = basicLength @UM.MVector @(Rep_ a) @s . coerce
    
    gbasicUnsafeSlice :: forall a s. CMV s a => Int -> Int -> UM.MVector s a -> UM.MVector s a
    gbasicUnsafeSlice i j = uncoercemv . basicUnsafeSlice @UM.MVector @(Rep_ a) @s i j . coerce
    
    -- etc.
    
    
    -- idem Vector
    
    
    -- This constraints holds when the UM.MVector data instance of a is
    -- representationally equivalent to the data instance of its generic
    -- representation (Rep_ a).
    type CMV s a = (Coercible (UM.MVector s a) (UM.MVector s (Rep_ a)), MVector UM.MVector (Rep_ a))
    
    -- Sadly coerce doesn't seem to want to solve this correctly so we use
    -- unsafeCoerce as a workaround.
    uncoercemv :: CMV s a => UM.MVector s (Rep_ a) -> UM.MVector s a
    uncoercemv = unsafeCoerce
    

    Now if we have some generic type

    data MyType = MyCons Int Bool ()
    

    We can define a generic instance with its isomorphism to a tuple

    instance Generic_ MyType where
      type Rep_ MyType = (Int, Bool, ())
      to_ (MyCons a b c) = (a, b, c)
      from_ (a, b, c) = MyCons a b c
    

    And from there, there is a totally generic recipe to get its Unbox instance, if you have YourType instead with its own Generic_ instance, you can take this and literally replace MyType with YourType.

    newtype instance UM.MVector s MyType
      = MVMyType { unMVMyType :: UM.MVector s (Rep_ MyType) }
    
    instance MVector UM.MVector MyType where
      basicLength = gbasicLength
      basicUnsafeSlice = gbasicUnsafeSlice
      -- etc.
    
    -- idem (Vector U.Vector MyType)
    
    -- MVector U.Vector & Vector UM.MVector   =   Unbox
    instance Unbox MyType
    

    In theory all this boilerplate could be automated with internal language features (as opposed to TemplateHaskell or CPP). But there are various issues that get in the way in the current state of things.

    First, Generic_ is essentially Generic from GHC.Generics. However, the uniform representation that gets derived by GHC is not in terms of tuples (,) but in terms of somewhat ad-hoc type constructors (:+:, :*:, M1, etc.), which lack Unbox instances.

    And second, MVector and Vector have quite a few methods. To avoid having to list them all, one might expect to leverage DerivingVia (or GeneralizedNewtypeDeriving), however they are not applicable because there are a couple of polymorphic monadic methods that prevent coercions (e.g., basicUnsafeNew). For now, the easiest way I can think of to abstract this is a CPP macro. In fact the vector package uses that technique internally, and it might be reusable somehow. I believe properly addressing those issues requires a deep redesign of the Vector/MVector architecture.

    Gist (not complete, but compilable): https://gist.github.com/Lysxia/c7bdcbba548ee019bf6b3f1e388bd660