rustrayon

Rayon Can't turn .chars() iterator into .par_iter()


i'm trying to parallelize the following function:

pub fn encode(&self, s: &String) -> String {
        s.chars()
            .par_iter() // error here
            .map(|c| Character::try_from(c))
            .enumerate()
            .map(|(n, c)| match c {
                Ok(plain) => self.encode_at(plain, n).into(),
                Err(e) => match e {
                    ParsingError::Charset(non_alphabetic) => non_alphabetic,
                    _ => unreachable!(),
                },
            })
            .collect()
    }

I get the following error when trying to go from the Chars iterator into a parallel iterator:

the method par_iter exists for struct std::str::Chars<'_>, but its trait bounds were not satisfied
the following trait bounds were not satisfied:
&std::str::Chars<'_>: IntoParallelIterator
which is required by std::str::Chars<'_>: rayon::iter::IntoParallelRefIteratorrustcE0599

I would expect that converting an iterator into a parallel iterator would be fairly trivial but apparently not


Solution

  • The problem is that characters in UTF-8 have variable size - ASCII characters take one byte but other ones take two to four bytes. This makes splitting up a string for parallel processing problematic, since the middle byte in the string array may not be the actual middle of the string, and may even be in the middle of a character.

    That said, that should not make parallel processing impossible. It's not critical that the string be evenly split among workers, and you can find the start or end of a multi-byte character in the middle of a UTF-8 sequence if you know how they are encoded.

    So at least in theory you could iterate in parallel over a string. I'm guessing the rayon authors haven't implemented it because it's not a common use case and it's somewhat tricky to do.