listrecursionfunctional-programmingpattern-matchingocaml

deleting duplicates tail recursively in OCaml


I tried to write my own solution for this exercise by iterating through a list with a empty complst list where all non duplicates are inserted into and then get returned. I know it is a over complicated approach after looking up the solution but would still like to understand why the pattern matching does not work as intended:

let compress list =
  let rec aux complst lst =
    match lst with 
    | [] -> complst
    | a :: (b :: c) -> if a = b then aux complst (b::c) else aux (a::complst) (b::c)
    | x -> x  
  in aux [] list;;
val comp : 'a list -> 'a list = <fun>

Regardless of the input, the output is always a list with only the last element:

 compress [1;1;2;2;3];;
- : int list = [3]

 compress [1;2;3];;
- : int list = [3]

Solution

  • Pattern matching

    Your pattern-matching matches against three patterns:

    1. The empty list: []
    2. The list with at least two elements: a :: (b :: c)
    3. A catch-all, which must by process of elimination be a list with a single element.

    Consider what happens when we evaluate your example:

    compress [1; 1; 2; 2; 3]
    aux [] [1; 1; 2; 2; 3]
    aux [] [1; 2; 2; 3]
    aux [1] [2; 2; 3]
    aux [1] [2; 3]
    aux [2; 1] [3]
    [3]
    

    Oops, as soon as it hit lst being [3] it just returned it.

    Let's rewrite your function to handle that single element list by adding to complst.

    let compress lst =
      let rec aux complst lst =
        match lst with 
        | [] -> complst
        | [x] -> aux (x::complst) []
        | a :: (b :: c) -> 
          if a = b then aux complst (b::c) 
          else aux (a::complst) (b::c)
      in 
      aux [] list
    

    Now:

    compress [1; 1; 2; 2; 3]
    aux [] [1; 1; 2; 2; 3]
    aux [] [1; 2; 2; 3]
    aux [1] [2; 2; 3]
    aux [1] [2; 3]
    aux [2; 1] [3]
    aux [3; 2; 1] []
    [3; 2; 1]
    

    Clean up and reversing the resulting list

    Of course, there are also ways to clean up your code a bit using a conditional guard and _ for values you don't need to bind names to. You probably also want to reverse your accumulator.

    let compress lst =
      let rec aux complst lst =
        match lst with 
        | [] -> List.rev complst
        | [x] -> aux (x::complst) []
        | a :: (b :: _ as tl) when a = b -> aux complst tl
        | a :: (_ :: _ as tl) -> aux (a::complst) tl   
      in
      aux [] lst
    

    Fold

    When you see this pattern of iterating over a list one element at a time and accumulating a new value, you can usually map that pretty well to List.fold_left.

    let compress lst =
      List.(
        lst 
        |> fold_left 
             (fun i x -> 
                match i with 
                | (x'::_) when x = x' -> i 
                | _ -> x::i) 
             [] 
        |> rev
      )
    

    Because List.fold_left can only be aware of one element at a time on the list, the function we pass as its first argument can't be aware of the next element in the list. But it is aware of the accumulator or "init" value. In this case that's another list, and we can pattern match out that list.

    If it's not empty and the first element is equal to the current element we're looking at, don't add it to the result list. Otherwise, do add it. This also handles the first element case where the accumulator is empty.

    Kudos on creating a tail-recursive solution to this problem!

    tail_mod_cons

    As of OCaml 4.14.0 and later, we can make this tail-recursive without jumping through any hoops using the tail module constructor.

    A simple non-tail-recursive solution:

    let rec compress = function
      | ([] | [_]) as lst -> lst
      | a::(b::_ as tl) when a = b -> compress tl
      | a::tl -> a :: compress tl
    

    This works, but clearly isn't tail-recursive.

    let[@tail_mod_cons] rec compress = function
      | ([] | [_]) as lst -> lst
      | a::(b::_ as tl) when a = b -> compress tl
      | a::tl -> a :: compress tl
    

    Now it is!

    Sequences

    We might also employ a lazier approach using sequences. This is naturally tail-recursive and allows us to consume as much or as little of the sequence as we wish.

    let elim_dupes_seq seq =
      let rec aux ?(last=None) seq () =
        match seq (), last with
        | Seq.Nil, None -> Seq.Nil
        | Seq.Nil, Some x -> Seq.Cons (x, Seq.empty)
        | Seq.Cons (x, seq'), None -> aux ~last: (Some x) seq' ()
        | Seq.Cons (x, seq'), Some x' when x = x' -> aux ~last seq' ()
        | Seq.Cons (x, seq'), Some x' -> Seq.Cons (x', aux ~last: (Some x) seq')
      in
      aux seq
    
    # [1; 2; 3; 4; 5; 5; 6; 7; 7; 8]
      |> List.to_seq
      |> elim_dupes_seq
      |> List.of_seq;;
    - : int list = [1; 2; 3; 4; 5; 6; 7; 8]
    # [1; 2; 3; 4; 5; 5; 6; 7; 7; 8]
      |> List.to_seq
      |> elim_dupes_seq
      |> Seq.take 4
      |> List.of_seq;;
    - : int list = [1; 2; 3; 4]