haskell

Haskell list sum generalization


I was working in Python and created an example of how we can generalize the usual sumList function to sum any arbitrary list of lists:

  1. To sum any list of numbers in Python we can write:
def sumList(xs): 
  if xs==[]:  
    return 0 
  else:  
    return xs[0]+sumList(xs[1:])

And in Haskell:

sumList [] = 0 
sumList (x:xs) = x + sumList xs
  1. but to sum an arbitrary nested list, in Python we have:
def sumList2(xs): 
  if xs==[]:  
    return 0 
  elif isinstance(xs,int):  
    return xs 
  else:  
    return sumList2(xs[0])+sumList2(xs[1:])

But the following Haskell code throws a compiler error in the type inference:

sumList [] = 0 
sumList (x:xs) = sumList x + sumList xs 
sumList x = x
<interactive>:2:38: error:
• Couldn't match expected type ‘t’ with actual type ‘[t]’
‘t’ is a rigid type variable bound by
the inferred type of sumList :: t -> [t]
at <interactive>:(1,1)-(3,13)
• In the first argument of ‘sumList’, namely ‘xs’
In the second argument of ‘(+)’, namely ‘sumList xs’
In the expression: sumList x + sumList xs
• Relevant bindings include
xs :: [t] (bound at <interactive>:2:12)
x :: t (bound at <interactive>:2:10)
sumList :: t -> [t] (bound at <interactive>:1:1)

I am using IHaskell, for what I understand, when using sumlist [] = 0 as pattern matching, Haskell inferrs that sumList :: [a] -> number, but then when I pass sumList x in the next line, x is the head of the list, so sumList :: a -> number, so it gets an absurd. What I figured is that I need to match the case where the input is a list and do the 2 first lines, and then the case where the input is an integer and do the last line, but I cannot figure out how to do it.

I am trying to create a sum type of Either Int [a], but I don't know if that is the better answer, because I want to let the compiler figure out everything related to types (ultimately I will write the type definition, but my initial thesis is that contrary to popular belief, it is possible to do quick experimentation in Haskell, and so that means to not be concerned with types in the first stage of programing). Note the shortcomings of the for loop, it would be very difficult or impossible to generalize the sum of the list using for loops, because of the linearity of the loop, I think it is a very compelling argument for the scaling of machine learning software.


Solution

  • Int, [Int], [[Int]], and so on are different types, and the type of a function is chosen at the moment it's called. If we expect our function to return an Int, and we have the two clauses

    sumList (x:xs) = {- does not matter what goes here -}
    sumList x = x
    

    then we're guaranteed that no matter which input type we choose, we have a type error. If we choose Int, then the first clause is an error because x:xs can only match lists; if we choose any list type, then the second clause is an error because it does not return an Int.

    However, not all is lost. One thing we can do is split these two clauses into separate functions that happen to share a name using type classes.

    class Sums a where sums :: a -> Int
    instance Sums Int where sums x = x
    instance Sums a => Sums [a] where
        sums [] = 0
        sums (x:xs) = sums x + sums xs
    

    Now when we call sums, we are never calling a function that has both clauses; instead, we are asking the compiler to call one or the other of these functions depending on which concrete type was chosen. The line

        sums (x:xs) = sums x + sums xs
    

    may look suspicious, since we know that x and xs do not have the same type. But we are rescued by the fact that these are not calling the same sums function! Again, the type of a function is chosen at the moment it's called, and here there are two different calls that may choose two different types, at which point the compiler will choose different functions to call.

    The other option we have is to make a new type. The trouble with Int, [Int], [[Int]], and so on being different types is that this means we must statically know how deeply nested a list is. If we don't know that, then we need a type that can express the fact that its nesting depth is only known dynamically, and which tracks the data it needs to know to determine how deep that nesting is. One way to do so would be:

    data DynList a = NotNested a | Nested [DynList a]
    
    sumDynList :: DynList Int -> Int
    sumDynList (NotNested x) = x
    sumDynList (Nested []) = 0
    sumDynList (Nested (x:xs)) = sumDynList x + sumDynList (Nested xs)
    

    I've written this to match as closely as possible your original code, but in reality I'd probably write the nested case this way:

    sumDynList (NotNested x) = x
    sumDynList (Nested xs) = sum (map sumDynList xs)
    

    This particular type allows different amounts of nesting at different points in the data structure. If the intent is instead that there is a single nesting depth, and all list elements have that same depth, then we need a slightly more advanced technique.

    data SharedDepthList a = Depth0 a | Deeper (SharedDepthList [a])
    
    sumSharedDepthList :: SharedDepthList Int -> Int
    sumSharedDepthList = go id where
        go :: (a -> Int) -> SharedDepthList a -> Int
        go f (Depth0 x) = f x
        go f (Deeper xs) = go (sum . map f) xs
    

    I think mostly I do not recommend this. Data structures with this much precision about the allowed possible values get quite unwieldy in practice.