How to satisfy Rayon's trait bounds in matrix multiplication algorithm?

I implemented matrix multiplication using functional approach.

struct Matrix {
  vals: Vec<i32>,
  rows: usize,
  cols: usize,
}

fn mul(a: &Matrix, b: &Matrix) -> Matrix {
  assert_eq!(a.cols, b.rows);
  let vals = a
    .vals
    .par_chunks(a.cols)
    .map(|row| {
      (0..b.cols).map(|i| {
        row
          .iter()
          .zip(b.vals.iter().skip(i).step_by(b.cols))
          .map(|(a, b)| a * b)
          .sum()
      })
    })
    .flatten()
    .collect();
  Matrix {
    vals,
    rows: a.rows,
    cols: b.cols,
  }
}

I tried to parallelize this algorithm by using par_chunks from Rayon 1.8.1 instead of chunks, so each row is constructed in a separate thread. With chunks the algorithm works, but with par_chunks it results in two errors:

error[E0277]: the trait bound `std::iter::Map<std::ops::Range<usize>, {closure@src\main.rs:145:23: 145:26}>: rayon::iter::ParallelIterator` is not satisfied
   --> src\main.rs:153:6
    |
153 |     .flatten()
    |      ^^^^^^^ the trait `rayon::iter::ParallelIterator` is not implemented for `std::iter::Map<std::ops::Range<usize>, {closure@src\main.rs:145:23: 145:26}>`

and

error[E0599]: the method `collect` exists for struct `Flatten<Map<Chunks<'_, i32>, {closure@main.rs:144:10}>>`, but its 
trait bounds were not satisfied
   --> src\main.rs:154:6
    |
141 |     let vals = a
    |  ______________-
142 | |     .vals
143 | |     .par_chunks(a.cols)
144 | |     .map(|row| {
...   |
153 | |     .flatten()
154 | |     .collect();
    | |     -^^^^^^^ method cannot be called due to unsatisfied trait bounds
    | |_____|
    | 
    |
   ::: C:\rust\.rustup\toolchains\stable-x86_64-pc-windows-gnu\lib/rustlib/src/rust\library\core\src\iter\adapters\map.rs:62:1
    |
62  |   pub struct Map<I, F> {
    |   -------------------- doesn't satisfy `_: IntoParallelIterator`
    |
   ::: C:\rust\.cargo\registry\src\index.crates.io-6f17d22bba15001f\rayon-1.8.1\src\iter\flatten.rs:11:1
    |
11  |   pub struct Flatten<I: ParallelIterator> {
    |   ---------------------------------------
    |   |
    |   doesn't satisfy `_: Iterator`
    |   doesn't satisfy `_: ParallelIterator`

I have no idea what is going on. I suspect that it might be somehow related to sharing second matrix between threads but it is read-only and is guaranteed to stay in the scope until threads have finished their jobs. Error messages don't help at all. What am I doing wrong? How do I even read this error message?

UPD. Rayon's prelude is included.

Solution

The error messages are not great here (the content is mostly correct but the location is misleading), but luckily there's only a few things that can go wrong when Rayon's trait bounds are not satisfied.

Here, flatten expects to find items that implement ParallelIterator. Instead, your map produces items which are a regular Iterator type. You can fix this by inserting an into_par_iter:

.map(|row| {
    (0..b.cols).into_par_iter().map(|i| {
        row.iter()
            .zip(b.vals.iter().skip(i).step_by(b.cols))
            .map(|(a, b)| a * b)
            .sum()
    })
})

Rayon also has the flatten_iter method that is similar to flatten but works on regular Iterator types. This works well if the outer iterator already has enough items to occupy all your cores.

.map(|row| {
    (0..b.cols).map(|i| {
        row.iter()
            .zip(b.vals.iter().skip(i).step_by(b.cols))
            .map(|(a, b)| a * b)
            .sum()
    })
})
.flatten_iter()

And finally, there's flat_map_iter, which just combines map and flatten_iter into one function.

.flat_map_iter(|row| {
    (0..b.cols).map(|i| {
        row.iter()
            .zip(b.vals.iter().skip(i).step_by(b.cols))
            .map(|(a, b)| a * b)
            .sum()
    })
})
// no .flatten()