haskell

How, why and when to use the ".Internal" modules pattern?


I've seen a couple of package on hackage which contain module names with .Internal as their last name component (e.g. Data.ByteString.Internal)

Those modules are usually not properly browsable (but they may show up nevertheless) in Haddock and should not be used by client code, but contain definitions which are either re-exported from exposed modules or just used internally.

Now my question(s) to this library organization pattern are:


Solution

  • Internal modules are generally modules that expose the internals of a package, that break package encapsulation.

    To take ByteString as an example: When you normally use ByteStrings, they are used as opaque data types; a ByteString value is atomic, and its representation is uninteresting. All of the functions in Data.ByteString take values of ByteString, and never raw Ptr CChars or something.

    This is a good thing; it means that the ByteString authors managed to make the representation abstract enough that all the details about the ByteString can be hidden completely from the user. Such a design leads to encapsulation of functionality.

    The Internal modules are for people that wish to work with the internals of an encapsulated concept, to widen the encapsulation.

    For example, you might want to make a new BitString data type, and you want users to be able to convert a ByteString into a BitString without copying any memory. In order to do this, you can't use opaque ByteStrings, because that doesn't give you access to the memory that represents the ByteString. You need access to the raw memory pointer to the byte data. This is what the Internal module for ByteStrings provides.

    You should then make your BitString data type encapsulated as well, thus widening the encapsulation without breaking it. You are then free to provide your own BitString.Internal module, exposing the innards of your data type, for users that might want to inspect its representation in turn.

    If someone does not provide an Internal module (or similar), you can't gain access to the module's internal representation, and the user writing e.g. BitString is forced to (ab)use things like unsafeCoerce to cast memory pointers, and things get ugly.

    The definitions that should be put in an Internal module are the actual data declarations for your data types:

    module Bla.Internal where
    
    data Bla = Blu Int | Bli String
    
    -- ...
    
    module Bla (Bla, makeBla) where -- ONLY export the Bla type, not the constructors
    
    import Bla.Internal
    
    makeBla :: String -> Bla -- Some function only dealing with the opaque type
    makeBla = undefined