Does Rust provide a way to decode a single character (unicode-scalar-value to be exact) from a &[u8], which may be multiple bytes, returning a single USV?
Something like GLib's g_utf8_get_char & g_utf8_next_char:
// Example of what glib's functions might look like once ported to Rust.
let i = 0;
while i < slice.len() {
let unicode_char = g_utf8_get_char(&slice[i..]);
// do something with the unicode character
funcion(unicode_char);
// move onto the next.
i += g_utf8_next_char(&slice[i..]);
}
Short of porting parts of the GLib API to Rust, does Rust provide a way to do this, besides some trial & error calls to from_utf8 which stop once the second character is reached?
See GLib's code.
Since rust 1.79 &[u8] now has a method utf8_chunks. This returns an object of type Utf8Chunks.
This can be used to get the functionality you want, even if the byte slice contains invalid utf-0 though it's not the perfect API. The simplest way to use it looks like.
let b : &[u8] = b"1\xD02";
for chunk in b.utf8_chunks() {
for c in chunk.valid().chars() {
println!("{} valid",c);
}
for b in chunk.invalid() {
println!("{} invalid",b);
}
}
If you want something more directly equivilent to your glib example that gives you a value and a length you can use.
let mut i = 0;
while i < b.len() {
let sliceend = min(i+4,b.len());
let remain = &b[i..sliceend];
let chunk = remain.utf8_chunks().next().unwrap().valid();
if let Some(c) = chunk.chars().next() {
println!("{} valid",c);
i += chunk.len();
} else {
println!("{} invalid",remain[0]);
i += 1;
}
}
Unfortunately the utf8chunks API may be rather inefficient in some cases, because it validates a whole "chunk" of valid utf-8 at once, even if you only needed to validate a few characters.