I'm working on a tiny duration parsing library written in Rust, and using the nom library. In this library, I define a second
parser combinator function. Its responsibility is to parse the various acceptable formats for representing seconds in a textual format.
pub fn duration(input: &str) -> IResult<&str, std::time::Duration> {
// Some code combining the various time format combinators
// to match the format "10 days, 8 hours, 7 minutes and 6 seconds"
}
pub fn seconds(input: &str) -> IResult<&str, u64> {
terminated(unsigned_integer_64, preceded(multispace0, second))(input)
}
fn second(input: &str) -> IResult<&str, &str> {
alt((
tag("seconds"),
tag("second"),
tag("secs"),
tag("sec"),
tag("s"),
))(input)
}
So far, the tag combinator was behaving as I expected. However, I discovered recently that the following example fails, and is by definition failing:
assert!(second("se").is_err())
Indeed, the documentation states that "The input data will be compared to the tag combinator’s argument and will return the part of the input that matches the argument".
However, as my example hopefully illustrates, what I would like to achieve is for some flavor of tag that would fail if the whole input could not be parsed. I looked into explicitly checking if there is a rest after parsing the input; and found that it would work. Also, unsuccessfully explored using some flavors of the complete
and take
combinators to achieve that.
What would be an idiomatic way to parse an "exact match" of a word, and fail on a partial result (that would return a rest)?
You can use the all consuming combinator, which succeeds if the whole input has been consumed by its child parser:
// nom 6.1.2
use nom::branch::alt;
use nom::bytes::complete::tag;
use nom::combinator::all_consuming;
use nom::IResult;
fn main() {
assert!(second("se").is_err());
}
fn second(input: &str) -> IResult<&str, &str> {
all_consuming(alt((
tag("seconds"),
tag("second"),
tag("secs"),
tag("sec"),
tag("s"),
)))(input)
}
I think I misunderstood your original question. Maybe this is closer to what you need. The key is that you should write smaller parsers, and then combine them:
use nom::branch::alt;
use nom::bytes::complete::tag;
use nom::character::complete::digit1;
use nom::combinator::all_consuming;
use nom::sequence::{terminated, tuple};
use nom::IResult;
#[derive(Debug)]
struct Time {
min: u32,
sec: u32,
}
fn main() {
//OK
let parsed = time("10 minutes, 5 seconds");
println!("{:?}", parsed);
//OK
let parsed = time("10 mins, 5 s");
println!("{:?}", parsed);
//Error -> although `min` is a valid tag, it would expect `, ` afterwards, instead of `ts`
let parsed = time("10 mints, 5 s");
println!("{:?}", parsed);
//Error -> there must not be anything left after "5 s"
let parsed = time("10 mins, 5 s, ");
println!("{:?}", parsed);
// Error -> although it starts with `sec` which is a valid tag, it will fail, because it would expect EOF
let parsed = time("10 min, 5 sections");
println!("{:?}", parsed);
}
fn time(input: &str) -> IResult<&str, Time> {
// parse the minutes section and **expect** a delimiter, because there **must** be another section afterwards
let (rem, min) = terminated(minutes_section, delimiter)(input)?;
// parse the minutes section and **expect** EOF - i.e. there should not be any input left to parse
let (rem, sec) = all_consuming(seconds_section)(rem)?;
// rem should be empty slice
IResult::Ok((rem, Time { min, sec }))
}
// This function combines several parsers to parse the minutes section:
// NUMBER[sep]TAG-MINUTES
fn minutes_section(input: &str) -> IResult<&str, u32> {
let (rem, (min, _sep, _tag)) = tuple((number, separator, minutes))(input)?;
IResult::Ok((rem, min))
}
// This function combines several parsers to parse the seconds section:
// NUMBER[sep]TAG-SECONDS
fn seconds_section(input: &str) -> IResult<&str, u32> {
let (rem, (sec, _sep, _tag)) = tuple((number, separator, seconds))(input)?;
IResult::Ok((rem, sec))
}
fn number(input: &str) -> IResult<&str, u32> {
digit1(input).map(|(remaining, number)| {
// it can panic if the string represents a number
// that does not fit into u32
let n = number.parse().unwrap();
(remaining, n)
})
}
fn minutes(input: &str) -> IResult<&str, &str> {
alt((
tag("minutes"),
tag("minute"),
tag("mins"),
tag("min"),
tag("m"),
))(input)
}
fn seconds(input: &str) -> IResult<&str, &str> {
alt((
tag("seconds"),
tag("second"),
tag("secs"),
tag("sec"),
tag("s"),
))(input)
}
// This function parses the separator between the number and the tag:
//N<separator>tag -> 5[sep]minutes
fn separator(input: &str) -> IResult<&str, &str> {
tag(" ")(input)
}
// This function parses the delimiter between the sections:
// X minutes<delimiter>Y seconds -> 1 min[delimiter]2 sec
fn delimiter(input: &str) -> IResult<&str, &str> {
tag(", ")(input)
}
Here I have created a set of basic parsers for the building blocks, such as "number", "separator", "delimiter", the various markers (min, sec, etc). None of those expect to be "whole words". Instead you should use combinators, such as terminated
, tuple
, all_consuming
to mark where the "exact word" ends.