I am learning parsing with nom
right now and There's a few problems that pop up that I cannot solve myself. One I'd like to ask here.
First: I am not sure, if I actually need to use nom
for this, but it seemed the easiest way to detect any combinations of spaces, tabs, newlines and carriage returns. I want to replace any sequences of these characters with only one single space. My kind of unelegant solution looks like this:
enum EmptyOrStr {
Empty,
Str(char),
fn replace_multispaces<'a>(i: &str) -> IResult<&str, String> {
let (rest, tokens) = many0(alt((
map(multispace1, |_| EmptyOrStr::Empty),
// i would like to use `not(multispace1)` instead of `anychar`,
// but then the output type is `()` and I cannot put it into the
// `EmptyOrStr::Str( )`-variant
map(anychar, |s| EmptyOrStr::Str(s)),
)))(i)?;
Ok((
"",
tokens
.into_iter()
.map(|t| match t {
EmptyOrStr::Str(s) => format!("{s}"), // because s is a `char` not a `&str`
EmptyOrStr::Empty => " ".to_string(),
})
.collect::<String>(),
))
}
I didn't manage to do this with &str. Intuitively I'd guess it would be better to have the second parser inside of alt
use something like take_till1(is_multispace1)
, but that needs a condition, not a parser.
I think there's just too many things used together for me to understand everything.
Ideally this whole replacing of multispace
's with single spaces wouldn't need nom. But The multispace1-function is pretty practical I guess.
Are there any obvious ways I could improve this?
If you just want to split the input up by any number of consecutive whitespace characters, you can use split_ascii_whitspace()
.
This example function below uses split_ascii_whitespace()
in order to split the input up and rejoin it using a single space character.
fn strip_multispace(input: &str) -> String {
input.split_ascii_whitespace().collect::<Vec<&str>>().join(" ")
}
fn main() {
let s = "This is a string with\t multiple\n\r\n white\t\t\r\nspaces";
println!("{}", strip_multispace(s));
}
This outputs:
This is a string with multiple white spaces