rustiteratorclonemultipass

What is the correct signature for a rust function which wants to iterate several times over an iterator/collection and avoid unnecessary cloning


I have a mathematical function which is multi-pass: to calculate the results, multiple iterations over a collection are needed. Typically a single-pass function would accept an "impl IntoIterator" (or be generic over I where I: IntoIterator) so that a Vec, or a slice or an iterator, or an iterator with subsequent filtering could be passed as an argument.

The problem I have is that


pub fn my_func<I>(into_iter: I) -> f64
where I: IntoIterator<Item=f64>, I::IntoIter: Clone {
   let iter1 = into_iter.into_iter();
   let iter2 = iter1.clone();
   // iterate first pass using iter1
   // then iterator second pass using iter2
   // the actual calculations here are just placeholders
   let res1: f64 = iter1.filter(|&x| x > 1.0).sum();
   let res2: f64 = iter2.filter(|&x| x > 2.0).sum();
   res1 / res2
}

permits Vec to be accepted as an argument. For single pass functions this is fine, as into_iter() will move the contents of the vector into the IntoIter struct, and iterate over them.

But for multi-pass functions where the iterator is cloned, IntoIter for Vec owns the collection of items and "clone" of the IntoIter results in the entire vector of items being cloned. Although clone on many iterators is lightweight, it is not the case when the Iterator is generated from Vec::into_iter().

I want my clients to call this as my_func(vec.iter()) or strictly speaking my_func(vec.iter().copied())

How can I prevent this accidentally happening by clients of the library whilst retaining flexibility to accept slices, iterators etc as arguments?

    let v = vec![1.0; 10_000_000];
    let _ = my_func(v.iter().copied());  // prferred calling method

    let v = vec![1.0; 10_000_000];
    let _ = my_func(v);                  // I want to forbid this

    let v = vec![1.0; 10_000_000];
    let _ = my_func(v.into_iter());      // I want to forbid this too!

Obviously I can accept a cloneable iterator as argument but that still doesnt deter vec.into_iter()


Solution

  • If you don't mind changing the calling syntax, then I think I found a maybe acceptable method: assume that it is always cheap to clone an iterator yielding reference, one could constrain the function to only accept them:

    pub fn my_func<'it, I>(into_iter: I) -> f64
    where
        I: IntoIterator<Item = &'it f64>,
        I::IntoIter: Clone,
    {
        let iter1 = into_iter.into_iter().copied();
        let iter2 = iter1.clone();
        // iterate first pass using iter1
        // then iterator second pass using iter2
        // the actual calculations here are just placeholders
        let res1: f64 = iter1.filter(|&x| x > 1.0).sum();
        let res2: f64 = iter2.filter(|&x| x > 2.0).sum();
        res1 / res2
    }
    

    This effectively rejects v, v.into_iter() and any iterators combined from an iterator that yields f64, as it seems like there's generally no reasonable way to get an iterator that returns reference from an owning one, and combinators on Iterator commonly don't change the Item type, unless they incur some kind of allocation:

    let v = vec![1.0; 10_000_000];
    let _ = my_func(&v);
    let _ = my_func(v.iter());
    let _ = my_func(v.iter().filter(|&&f| f < 5.0));
    let _ = my_func([1.0; 100].iter());
    
    let f = 4.0;
    let _ = my_func(iter::repeat(&0.5));
    let _ = my_func(iter::repeat(&f));
    
    // forbidden
    // let _ = my_func(v);
    
    // forbidden
    // let _ = my_func(v.into_iter());
    
    // forbidden
    // let _ = my_func(v.into_iter().map(|f| f + 1.));
    

    There's still a problem, though. Consider we wanna implement a custom iterator that calculates something each step and returns the result:

    let _ = my_func(
        iter::repeat_with({
            let mut cnt = 0.;
            move || {
                let next = cnt;
                cnt += 1.;
                // trouble here: can't return reference to local?
                // &next
            };
        })
        .take(10),
    );
    

    A simple workaround is to define a wrapper around f64, with a Deref implementation referring to its data, and adjust my_func and the custom iter accordingly:

    pub fn my_func<'it, I, Item>(into_iter: I) -> f64
    where
        I: IntoIterator<Item = Item>,
        I::IntoIter: Clone,
        Item: Deref<Target = f64>,
    {
        let iter1 = into_iter.into_iter().map(|i| *i);
        let iter2 = iter1.clone();
        // iterate first pass using iter1
        // then iterator second pass using iter2
        // the actual calculations here are just placeholders
        let res1: f64 = iter1.filter(|&x| x > 1.0).sum();
        let res2: f64 = iter2.filter(|&x| x > 2.0).sum();
        res1 / res2
    }
    
    #[repr(transparent)]
    struct RefVal<T>(T);
    
    impl<T> Deref for RefVal<T> {
        type Target = T;
    
        fn deref(&self) -> &T {
            &self.0
        }
    }
    
    
    fn main() {
        let v = vec![1.5; 10_000_000];
        let _ = my_func(&v);
        let _ = my_func(v.iter());
        let _ = my_func(v.iter().filter(|&&f| f < 5.));
        let _ = my_func([1.; 100].iter());
    
        let f = 4.0;
        let _ = my_func(iter::repeat(&0.5).take(3));
        let _ = my_func(iter::repeat(&f).take(3));
    
        let _ = my_func(
            iter::repeat_with({
                let mut cnt = 0.;
                move || {
                    let next = cnt;
                    cnt += 1.;
                    RefVal(next)
                }
            })
            .take(10),
        );
    
        // still forbidden
        // let _ = my_func(v);
    
        // still forbidden
        // let _ = my_func(v.into_iter());
    
        // forbidden
        // let _ = my_func(v.into_iter().map(|f| f + 42.).filter(|&f| f % 1));
    
        let _ = my_func(v.clone().into_iter().map(RefVal));
    }
    

    This will allow the user to write my_func(v.into_iter().map(RefVal)), but now they're intentionally doing this, not by accident.

    Full code at playground