stringrustreplacenomalt-attribute

Replacing any sequences of spaces, tabs, newlines etc with single spaces using nom


I am learning parsing with nom right now and There's a few problems that pop up that I cannot solve myself. One I'd like to ask here. First: I am not sure, if I actually need to use nom for this, but it seemed the easiest way to detect any combinations of spaces, tabs, newlines and carriage returns. I want to replace any sequences of these characters with only one single space. My kind of unelegant solution looks like this:

enum EmptyOrStr {
    Empty,
    Str(char),

fn replace_multispaces<'a>(i: &str) -> IResult<&str, String> {
    let (rest, tokens) = many0(alt((
        map(multispace1, |_| EmptyOrStr::Empty),
        // i would like to use `not(multispace1)` instead of `anychar`,
        // but then the output type is `()` and I cannot put it into the 
        // `EmptyOrStr::Str( )`-variant
        map(anychar, |s| EmptyOrStr::Str(s)),
    )))(i)?;

    Ok((
        "",
        tokens
            .into_iter()
            .map(|t| match t {
                EmptyOrStr::Str(s) => format!("{s}"), // because s is a `char` not a `&str`
                EmptyOrStr::Empty => " ".to_string(),
            })
            .collect::<String>(),
    ))
}

I didn't manage to do this with &str. Intuitively I'd guess it would be better to have the second parser inside of alt use something like take_till1(is_multispace1), but that needs a condition, not a parser. I think there's just too many things used together for me to understand everything.

Ideally this whole replacing of multispace's with single spaces wouldn't need nom. But The multispace1-function is pretty practical I guess.

Are there any obvious ways I could improve this?


Solution

  • If you just want to split the input up by any number of consecutive whitespace characters, you can use split_ascii_whitspace().

    This example function below uses split_ascii_whitespace() in order to split the input up and rejoin it using a single space character.

    fn strip_multispace(input: &str) -> String {
        input.split_ascii_whitespace().collect::<Vec<&str>>().join(" ")
    }
    
    fn main() {
        let s = "This is    a string with\t multiple\n\r\n white\t\t\r\nspaces";
        println!("{}", strip_multispace(s));
    }
    

    This outputs:

    This is a string with multiple white spaces