dictionaryrustnom

What is the idiomatic in Rust's nom way of turning (mapping) a parsing-error into an Ok-result?


I am not sure If I'm thinking wrong about the whole thing. Maybe there is a simpler solution.

In nom I want to parse C-style single line comments. Each line that I parse could theoretically contain a "// some comment" on the right side. I wrote a parser that can Parse these comments:

pub fn parse_single_line_comments(i: &str) -> IResult<&str, &str> {
    recognize(pair(tag("//"), is_not("\n\r")))(i)
}

It works in the case of a comment being present. But unfortunately if there is no comment it returns an error. Now I would like it to return an empty String instead (or later I could return an option, which would be more elegant). In my nom-learning I had this problem quite often - that I want to replace an error with a custom OK-variant. But I am never sure If I did it in the "right" way i.e. the idiomatic way of nom/rust. It always felt ugly as I was matching the return value of the parsing function. Think of it like this:

pub fn parse_single_line_comments(i: &str) -> IResult<&str, &str> {
    match recognize(pair(tag("//"), is_not("\n\r")))(i) {
        Ok((rest, comment)) => Ok((rest, comment)),
        _ => Ok((i, "")),
}

It looks kind of strange to me. There should be a better way to do this, right?


Solution

  • You already hinted a bit at it yourself. You could use optional to parse zero-or-one line comments, or many0 to parse zero-to-many. Then combine that with preceded, and you can easily discard zero-to-many comments (and whitespace).

    Let's consider a simple parse_ident to parse identifiers, that looks like this:

    use nom::bytes::complete::take_while1;
    use nom::{AsChar, IResult};
    
    fn parse_ident(input: &str) -> IResult<&str, &str> {
        take_while1(|c: char| c.is_alpha() || (c == '_'))(input)
    }
    

    Now, again, let's say we want to skip zero-to-many whitespace and comments beforehand. First we can define our line comment parser (which you already did):

    fn parse_single_line_comment(input: &str) -> IResult<&str, &str> {
        recognize(pair(tag("//"), is_not("\n\r")))(input)
    }
    

    Now we'll change parse_ident to use preceded and many0 to skip zero-to-many line comments. Additionally, we can also throw in multispace1 to skip zero-to-many whitespace as well:

    use nom::branch::alt;
    use nom::bytes::complete::{is_not, tag, take_while1};
    use nom::character::complete::multispace1;
    use nom::combinator::recognize;
    use nom::multi::many0;
    use nom::sequence::{pair, preceded};
    use nom::{AsChar, IResult};
    
    fn parse_ident(input: &str) -> IResult<&str, &str> {
        preceded(
            // Parsers to skip anything that is ignored
            many0(alt((
                parse_single_line_comment,
                multispace1,
            ))),
            // Identifier parsing
            take_while1(|c: char| c.is_alpha() || (c == '_')),
        )(input)
    }
    

    Which now allows us to successfully parse the following:

    assert_eq!(
        parse_ident("identifier")
        Ok(("", "identifier"))
    );
    assert_eq!(
        parse_ident("     identifier"),
        Ok(("", "identifier"))
    );
    assert_eq!(
        parse_ident("// Comment\n  identifier"),
        Ok(("", "identifier"))
    );
    assert_eq!(
        parse_ident("// Comment\n// Comment\n  identifier"),
        Ok(("", "identifier"))
    );
    

    Depending on what you're parsing, then you'll need to sprinkle that preceded in various parsers. We can simplify the duplicate code a bit, by introducing our own skip_ignored parser:

    fn skip_ignored<'a, F>(parser: F) -> impl FnMut(&'a str) -> IResult<&'a str, &'a str>
    where
        F: FnMut(&'a str) -> IResult<&'a str, &'a str>,
    {
        preceded(
            many0(alt((
                parse_single_line_comment,
                multispace1,
            ))),
            parser,
        )
    }
    
    fn parse_ident(input: &str) -> IResult<&str, &str> {
        skip_ignored(
            take_while1(|c: char| c.is_alpha() || (c == '_')),
        )(input)
    }
    

    Whether there's easier ways to do this highly depends on your data. But as long as you simply want to discard the whitespace and comments, then it's relatively straight-forward.


    Since you actually asked about custom errors, then you can define your own enum as you otherwise would, and then impl ParseError:

    use nom::error::{ErrorKind, ParseError};
    
    #[derive(Debug)]
    pub enum MyParseError<'a> {
        IdentTooLong,
        Nom(&'a str, ErrorKind),
    }
    
    impl<'a> ParseError<&'a str> for MyParseError<'a> {
        fn from_error_kind(input: &'a str, kind: ErrorKind) -> Self {
            Self::Nom(input, kind)
        }
    
        fn append(_: &'a str, _: ErrorKind, other: Self) -> Self {
            other
        }
    }
    

    Using it could look like this:

    use nom::bytes::complete::take_while1;
    use nom::{AsChar, IResult};
    
    fn parse_ident<'a>(input: &'a str) -> IResult<&'a str, &'a str, MyParseError<'a>> {
        let (input, ident) = take_while1(|c: char| c.is_alpha() || (c == '_'))(input)?;
    
        // Return error if identifier is longer than 10 bytes
        if ident.len() > 10 {
            Err(nom::Err::Failure(MyParseError::IdentTooLong))
        } else {
            Ok((input, ident))
        }
    }
    
    fn main() {
        println!("{:?}", parse_ident(""));
        // Err(Error(Nom("", TakeWhile1)))   
    
        println!("{:?}", parse_ident("hello"));
        // Ok(("hello", "hello"))
    
        println!("{:?}", parse_ident("this_is_a_very_long_name"));
        // Err(Failure(IdentTooLong)) 
    }
    

    There's also FromExternalError, which works hand-in-hand with map_res. This is useful if say you want to call str::parse() and be able to easy map it into your MyParseError.

    See also: