rustnom

Nom parser fails to not consume invalid input


My test parse_line_repeat is crashing in the last line (which is invalid to such point) instead of returning it as a remainder as show in the test itself.

I have tried using tuple instead of pair and many0 instead of many1, the first got solved by this blogpost on nom and the second after reading again the documentation on choosing a combinator.

What I want to do may be described as error recovery, but by this article says otherwise, thus I am out of ideas of for what search.

Here is the lib.rs

use nom::{
    branch::alt,
    bytes::complete::tag,
    character::complete::space0,
    multi::{many1, many_till},
    sequence::pair,
    IResult,
};

fn parse_token(input: &str) -> IResult<&str, bool> {
    let (remaining, token) = alt((tag("0"), tag("1")))(input)?;
    if token == "1" {
        return Ok((remaining, true));
    } else {
        return Ok((remaining, false));
    }
}

#[allow(dead_code)]
fn parse_line(input: &str) -> IResult<&str, Vec<bool>> {
    let (remaining, (tokens_raw, _)) = pair(
        many1(many_till(space0, parse_token)),
        many_till(space0, tag("\n")),
    )(input)?;

    let mut tokens = Vec::new();

    for (_, token) in tokens_raw {
        let token: bool = token;
        tokens.push(token);
    }

    Ok((remaining, tokens))
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn parse_token_test() {
        assert_eq!(parse_token("0"), Ok(("", false)));
        assert_eq!(parse_token("1"), Ok(("", true)));
    }

    #[test]
    fn parse_line_test() {
        assert_eq!(parse_line("1 \n"), Ok(("", vec![true])));
        assert_eq!(parse_line("0 \n"), Ok(("", vec![false])));
        assert_eq!(
            parse_line("1 1 1 1\n"),
            Ok(("", vec![true, true, true, true]))
        );
    }

    #[test]
    fn parse_line_repeat() {
        let rtn = parse_line("1 \n 1\n");
        assert_eq!(rtn, Ok((" 1\n", vec![true])));
        let rtn = parse_line(rtn.unwrap().0);
        assert_eq!(rtn, Ok(("", vec![true])));
        let rtn = parse_line("1\n 1\n\n");
        assert_eq!(rtn, Ok((" 1\n\n", vec![true])));
        let rtn = parse_line(rtn.unwrap().0);
        assert_eq!(rtn, Ok(("\n", vec![true])));
        let rtn = parse_line(rtn.unwrap().0);
        assert_eq!(rtn, Ok(("\n", vec![])));
    }    
}

Here is the Cargo.toml

[package]
name = "minimal"
version = "0.1.0"
edition = "2021"

[lib]
name = "minimal"
path = "src/lib.rs"

[dependencies]
nom = "7.1.3"

Error:

$ cargo test
    Finished test [unoptimized + debuginfo] target(s) in 0.94s
     Running unittests minimal/main.rs

running 3 tests
test tests::parse_line_test ... ok
test tests::parse_token_test ... ok
test tests::parse_line_repeat ... FAILED

failures:

---- tests::parse_line_repeat stdout ----
thread 'tests::parse_line_repeat' panicked at 'assertion failed: `(left == right)`
  left: `Err(Error(Error { input: "\n", code: ManyTill }))`,
 right: `Ok(("\n", []))`', minimal/main.rs:67:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    tests::parse_line_repeat

test result: FAILED. 2 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

error: test failed, to rerun pass `--lib`

Playground


Solution

  • I think the real reason is that your tests / expected behavior is flawed.

    Reworded in simpler terms, your last test checks the following behavior for parse_line:

    Then, you check:

    And finally:

    That last one doesn't make much sense to me. If 1\n should become <empty>, then why should \n not also become <empty>? Why does one consume the newline, and the other doesn't? If it is because an empty line without 1s should not be consumed by the parser, then why is it wrong that it returns an error? Returning an error is what is expected of a parser that cannot consume an object.

    So the possible ways you could go are:

    Your original code kind of indicates that you would like the empty line to not be consumed. In that case, I would modify the test to:

    #[test]
    fn parse_line_repeat() {
        let rtn = parse_line("1 \n 1\n");
        assert_eq!(rtn, Ok((" 1\n", vec![true])));
        let rtn = parse_line(rtn.unwrap().0);
        assert_eq!(rtn, Ok(("", vec![true])));
        let rtn = parse_line("1\n 1\n\n");
        assert_eq!(rtn, Ok((" 1\n\n", vec![true])));
        let rtn = parse_line(rtn.unwrap().0);
        assert_eq!(rtn, Ok(("\n", vec![true])));
        let rtn = parse_line(rtn.unwrap().0);
        assert_eq!(
            rtn,
            Err(nom::Err::Error(nom::error::Error {
                input: "\n",
                code: nom::error::ErrorKind::ManyTill
            }))
        );
    }