haskellcompiler-errorstuplessizecompiler-specific

Why are GHC tuples limited to size 62?


From the the Haskell 98 report:

There is no upper bound on the size of a tuple, but some Haskell implementations may restrict the size of tuples, and limit the instances associated with larger tuples. However, every Haskell implementation must support tuples up to size 15, together with the instances for Eq, Ord, Bounded, Read, and Show. (...)

However, it's well known that GHC does not support tuples of size greater than 62. Here's what happens when I try to create a tuple of size 63 in GHCi:

<interactive>:1:1: error:
    A 63-tuple is too large for GHC
      (max size is 62)
      Workaround: use nested tuples or define a data type

I recognise that this is in accordance with the Haskell 98 specification, and also that a tuple of size greater than 62 is likely to be highly unnecessary, but I don't understand why exactly this is as it is in GHC.


To summarise:

In addition:


Solution

  • I think the speculation re: the timing of this change in the comments is wrong. First, as far as I can tell, the limit has existed since LONG before 6.12.1. As can be seen in Trac #98 from November 2002, in version 5.02.2, the limit was 37 (instead of 62), and trying to use a larger tuple generated a mysterious message:

    Switches_tupel.lhs:1:
    Failed to find interface decl for
    `PrelTup.(,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,)'
        from module `PrelTup'
    

    Simon Peyton-Jones fixed the bug by having the compiler check the size earlier in the compilation pipeline and generate a nicer error message (visible in Git commit b44c6881). By the time this commit was made, the limit had already been increased from 37 to 62 (Git commit 9af77fa4, which integrated Template Haskell work into the HEAD), so GHC 5.04 was released with a 62-tuple limit and a better error message.

    I believe the original Trac #98 bug points to the reason for a limit. In ghc/compiler/prelude/TysWiredIn.hs, a set of tuple type and data constructors is preallocated:

    boxedTupleArr, unboxedTupleArr :: Array Int (TyCon,DataCon)
    boxedTupleArr   = listArray (0,mAX_TUPLE_SIZE) [mk_tuple Boxed   i 
                        | i <- [0..mAX_TUPLE_SIZE]]
    unboxedTupleArr = listArray (0,mAX_TUPLE_SIZE) [mk_tuple Unboxed i 
                        | i <- [0..mAX_TUPLE_SIZE]]
    

    where mAX_TUPLE_SIZE is the 62-tuple limit in question. However, the functions that actually use these preallocated arrays are happy to generate bigger constructors on demand ("Build one specially"):

    tupleTyCon :: Boxity -> Arity -> TyCon
    tupleTyCon sort i | i > mAX_TUPLE_SIZE 
                    = fst (mk_tuple sort i)  -- Build one specially
    tupleTyCon Boxed   i = fst (boxedTupleArr   ! i)
    tupleTyCon Unboxed i = fst (unboxedTupleArr ! i)
    

    and this is what the compiler used to do before Simon added the error message for 5.04 -- it built one specially.

    Unfortunately, this caused an error (not a segfault, just an error) later in the compilation process when the compiler couldn't find an interface definition for the too-large tuple in the list given in ghc/libraries/ghc-prim/GHC/Tuple.hs. As per the (slightly outdated) comments in TysWiredIn.hs under the heading The tuple types, the declarations in Tuple.hs are used to construct the info tables and entry code for tuple constructors, even though these could theoretically be programmatically generated on the fly for arbitrarily large tuples.

    So, what does this mean for modern-day GHC? Well, for the same technical reasons described above, even though the compiler is prepared to generate arbitrarily large tuples, there's a limit imposed by the fact that they require matching declarations in .../GHC/Tuple.hs.

    I ran a few experiments, compiling GHC from source with the tuple length check disabled. The resulting compiler successfully compiled and ran the following program with a 100-tuple:

    a = (False,...,False)  -- imagine 100 Falses
    main = let (x,_,...,_) = a
           in print x
    

    and it printed "False". It worked fine when I modified it to grab the last element of the same tuple:

    a = (False,...,False)  -- imagine 100 Falses
    main = let (_,...,_,x) = a
           in print x
    

    However, the program:

    a = (False,...,False)  -- imagine 100 Falses
    main = let (x,_,...,_,y) = a
           in print (x,y)
    

    failed with a linkage error:

    [1 of 1] Compiling Main             ( Tuple.hs, Tuple.o )
    Linking Tuple ...
    Tuple.o(.data+0x0): error: undefined reference to 'ghczmprim_GHCziTuple_Z100T_con_info'
    collect2: error: ld returned 1 exit status
    `gcc' failed in phase `Linker'. (Exit code: 1)
    

    I suspect that for the first two programs, the compiler optimized away the reference to the missing constructor, but the final program needed it. After I added a declaration for a 100-tuple in Tuple.hs and rebuilt the compiler, all three programs compiled and ran fine.

    In short, compiling the manually constructed list of tuples in Tuple.hs generates required data structures to support tuples up to size 62, and no one has been sufficiently motivated to re-implement this data structure generation to be independent of the Tuple.hs crutch. If they did, GHC would probably support tuples of arbitrary size.

    As an aside, the note in Tuple.hs about Manuel's segfault (referenced in one of the comments to this question) dates from July 2001, when it was checked into libraries/base/Data/Tuple.hs, so whatever it was about, it had nothing to do with GHC 6.12.1. This issue is probably the reason the max was set to 62 by Simon, but the limit no longer seems to apply with modern GHC.