optimizationfunctional-programmingrust

Collatz conjecture in Rust: functional v imperative approach


I wanted to play around with some good old Collatz conjecture and decided that it would be fun to do it in a (very) functional style, so I implemented an unfoldr function, close to the one Haskell has:

fn unfoldr<F, T>(foo: F, seed: T, mut vec: Vec<T>) -> Vec<T>
    where F: Fn(T) -> Option<(T, T)>
{
    if let Some((x, y)) = foo(seed) {
        vec.push(x);
        unfoldr(foo, y, vec)
    } else {
        vec
    }
}

The rest was pretty straightforward:

fn collatz_next(n: u64) -> u64 {
    if n % 2 == 0 { n / 2 } else { 3 * n + 1 }
}

pub fn collatz_seq_f(n: u64) -> Vec<u64> {
    unfoldr(|n| if n == 1 { None } else { Some((n, collatz_next(n))) }, n, Vec::new())
}

With collatz_seq_f returning a Vector with the sequence starting with a given number n.

I wondered, however, if Rust approves of this style, and implemented an simple imperative counterpart:

pub fn collatz_seq_i(n: u64, mut vec: Vec<u64>) -> Vec<u64> {
    let mut c = n;
    while c != 1 {
        vec.push(c);
        c = collatz_next(c);
    }
    vec
}

And compared them with cargo bench (0.13.0-nightly (2ef3cde 2016-09-04)). I was a bit disappointed that my fun unfoldr approach was only half as fast as the imperative implementation:

running 3 tests
test tests::it_works ... ignored
test tests::bench_collatz_functional ... bench:         900 ns/iter (+/- 47)
test tests::bench_collatz_imperative ... bench:         455 ns/iter (+/- 29)

test result: ok. 0 passed; 0 failed; 1 ignored; 2 measured

I know that the unfoldr version is more abstract, but I didn't expect that much of a difference; is there anything that I could change to make it faster?

Full code below:

#![feature(test)]

extern crate test;

fn unfoldr<F, T>(foo: F, seed: T, mut vec: Vec<T>) -> Vec<T>
    where F: Fn(T) -> Option<(T, T)>
{
    if let Some((x, y)) = foo(seed) {
        vec.push(x);
        unfoldr(foo, y, vec)
    } else {
        vec
    }
}

fn collatz_next(n: u64) -> u64 {
    if n % 2 == 0 { n / 2 } else { 3 * n + 1 }
}

pub fn collatz_seq_f(n: u64) -> Vec<u64> {
    unfoldr(|n| if n == 1 { None } else { Some((n, collatz_next(n))) }, n, Vec::new())
}

pub fn collatz_seq_i(n: u64, mut vec: Vec<u64>) -> Vec<u64> {
    let mut c = n;
    while c != 1 {
        vec.push(c);
        c = collatz_next(c);
    }
    vec
}

#[cfg(test)]
mod tests {
    use super::*;
    use test::Bencher;

    #[test]
    fn it_works() {
        assert_eq!(110, collatz_seq_f(27).len());
        assert_eq!(110, collatz_seq_i(27, Vec::new()).len());
    }

    #[bench]
    fn bench_collatz_functional(b: &mut Bencher) {
        b.iter(|| collatz_seq_f(27));
    }

    #[bench]
    fn bench_collatz_imperative(b: &mut Bencher) {
        b.iter(|| collatz_seq_i(27, Vec::new()));
    }
}

Solution

  • This is going to contain some implementation details of why unfoldr is a bit slow.

    I proposed a different variant, and @breeden helped me verify that it is an improvement that makes it match the performance imperative variant. It does preserve the recursion, but we can't call it functional anymore.. [^1]

    fn unfoldr2<F, T>(foo: F, seed: T, vec: &mut Vec<T>)
        where F: Fn(T) -> Option<(T, T)>
    {
        if let Some((x, y)) = foo(seed) {
            vec.push(x);
            unfoldr2(foo, y, vec)
        }
    }
    
    fn collatz_next(n: u64) -> u64 {
        if n % 2 == 0 { n / 2 } else { 3 * n + 1 }
    }
    
    pub fn collatz_seq_f(n: u64) -> Vec<u64> {
        let mut v = Vec::new();
        unfoldr2(|n| if n == 1 { None } else { Some((n, collatz_next(n))) }, n, &mut v);
        v
    }
    

    The difference here will illustrate what “went wrong” with the first version. In unfoldr, there is a vec value being carried around, and in unfoldr2 there is just a mutable reference to a vector instead.

    The vec value has an effect in unfoldr and you discovered it limited the compiler: unwinding. Unwinding is what happens if a function panics. If it unwinds through the unfoldr function, all local variables must be dropped, and that means vec. Some special code is inserted to deal with this (called a “landing pad”) and function calls that may panic insert an instruction to divert to the landing pad on panic.

    So in unfoldr:

    1. There's a local variable that has a destructor, vec
    2. There's a function call that may panic (vec.push panics on capacity overflow)
    3. There's a landing pad that drops vec and resumes unwinding

    Additionally, there's code to move the Vec value around. (It is copied to the stack to be available for the landing pad code).

    unfoldr2 doesn't get any magic recursion-into-just-a-loop optimization or so, but it still has less code because it has no need to handle unwinding or move the Vec.

    [^1]: Can we salvage the functional-ness by imagining the vec.push(x) as being an interface to a stream / generator / outputiterator, or just be a callback?