haskellfunctional-programmingfold

Is implementing the words function possible without a postprocessing step after folding?


Real World Haskell, chapter 4, page 98 of the printed version asks if words can be implemented using folds, and this is my question too:

Is it possible? If not, why? If it is, how?

I came up with the following, which is based on the idea that each non-space should be prepended to the last word in the output list (this happens in the otherwise guard), and that a space should trigger the appending of an empty word to the output list if there is not one already (this is handled in the if-then-else).

myWords :: String -> [String]
myWords = foldr step [[]]
  where
    step x yss@(y:ys)
      | x == ' ' = if y == "" then yss else "":yss
      | otherwise = (x:y):ys

Clearly this solution is wrong, since leading spaces in the input string result in one leading empty string in the output list of strings.

At the link above, I've looked into several of the proposed solutions for other readers, and many of them work similarly to my solution, but they generally "post-process" the output of the fold, for instance by tailing it if there is an empty leading string.

Other approaches use tuples (actually just pairs), so that the fold deals with the pair and can well handle the leading/trailing spaces.

In all these approaches, foldr (or another fold, fwiw) is not the function that provides the final output out of the box; there's always something else with has to adjust the output somehow.

Therefore I go back to the initial question and ask if it is actually possible to implement words (in a way that it correctly handles trailing/leading/repeated spaces) using folds. By using folds I mean that the folding function has to be the outermost function:

myWords :: String -> [String]
myWords input = foldr step seed input

Solution

  • If I understand correctly, your requirements include

    (1) words "a b c" == words " a b c" == ["a", "b", "c"]
    (2) words "xa b c" == ["xa", "b", "c"] /= ["x", "a", "b", "c"] == words "x a b c"
    

    This implies that we can not have

    words = foldr step base
    

    for any step and base.

    Indeed, if we had that, then

    words "xa b c"
    = def words and foldr
    step 'x' (words "a b c")
    = (1)
    step 'x' (words " a b c")
    = def words and foldr
    words "x a b c"
    

    and this contradicts (2).

    You definitely need some post-processing after the foldr.