I recently found out that bytes.nth(i) is waay faster than chars.nth(i) if you want to iterate over a String.
pub fn test(word1: String) {
for i in (0..word1.len()).rev() {
word1.bytes().nth(i);
}
}
// runs in +- 820 microseconds (word1.len() = 42551)
vs
pub fn test(word1: String) {
for i in (0..word1.len()).rev() {
word1.chars().nth(i);
}
}
// runs in +- 17.5 seconds (word1.len() = 42551)
I found the explanation te be because bytes.nth() has time complexity O(1) and chars.nth() has time complexity O(n). But why is this the case?
I also saw a suggestion to instantiate a variable and use the chars.next() / .next_back() like in the example below.
let mut word1_chars = word1.chars();
for i in (0..word1.len()).rev() {
word1_chars.next_back()
}
// runs in +- 600 microseconds (word1.len() = 42551)
I was shocked when I saw this was even faster. I presume there is a logical explanation for all of this so if you know it please share it :).
Thanks in advance
Edit: The times were recorded when building in debug mode. When building them for release there is no difference between bytes().nth() and chars.next() (While chars().nth() is more than 10 000 times slower). Why is there such a "big" difference between bytes().nth() and chars.next() in debug mode but not in release?
.chars()
enumerates the char
s of the string, not simply the bytes.
Coming from C for example, there is an historical confusion between char
(with text in mind) and byte (a small integer); in quick-and-dirty disposable programs we don't bother making the distinction (although we should).
On the other hand, in Rust a char
is not a single byte, but a wider integer which could be the result of decoding a UTF-8 sequence of bytes.
Such a single char
could come from a sequence of one, two, three or four bytes.
Rust strings are UTF-8 encoded, then accessing the N^th char
cannot be performed in constant time because each of the preceding char
s could be made of various bytes.
Each one has to be decoded in order, and we stop when N char
s have been decoded.
Concerning your second question about .next()
being «faster», I guess you compare something like
for i in 0..N {
let c = word.chars().nth(i);
// ...
}
against something like
for c in word.chars() {
// ...
}
which is similar to
let mut it = word.chars();
while let Some(c) = it.next() {
// ...
}
In the first case, each iteration implies decoding all the char
s from the beginning of the string, until i
.
The two other cases simply decode, at each iteration, only the next char
, starting from the previous byte-offset (kept from the previous iteration inside the iterator).
Note that .next_back()
is also efficient because a UTF-8 sequence of bytes can be decoded backwards.