I am looking into parslet to write alot of data import code. Overall, the library looks good, but I'm struggling with one thing. Alot of our input files are fixed width, and the widths differ between formats, even if the actual field doesn't. For example, we might get a file that has a 9-character currency, and another that has 11-characters (or whatever). Does anyone know how to define a fixed width constraint on a parslet atom?
Ideally, I would like to be able to define an atom that understands currency (with optional dollar signs, thousand separators, etc...) And then I would be able to, on the fly, create a new atom based on the old one that is exactly equivalent, except that it parses exactly N characters.
Does such a combinator exist in parslet? If not, would it be possible/difficult to write one myself?
What about something like this...
class MyParser < Parslet::Parser
def initialize(widths)
@widths = widths
super
end
rule(:currency) {...}
rule(:fixed_c) {currency.fixed(@widths[:currency])}
rule(:fixed_str) {str("bob").fixed(4)}
end
puts MyParser.new.fixed_str.parse("bob").inspect
This will fail with:
"Expected 'bob' to be 4 long at line 1 char 1"
Here's how you do it:
require 'parslet'
class Parslet::Atoms::FixedLength < Parslet::Atoms::Base
attr_reader :len, :parslet
def initialize(parslet, len, tag=:length)
super()
raise ArgumentError,
"Asking for zero length of a parslet. (#{parslet.inspect} length #{len})" \
if len == 0
@parslet = parslet
@len = len
@tag = tag
@error_msgs = {
:lenrep => "Expected #{parslet.inspect} to be #{len} long",
:unconsumed => "Extra input after last repetition"
}
end
def try(source, context, consume_all)
start_pos = source.pos
success, value = parslet.apply(source, context, false)
return succ(value) if success && value.str.length == @len
context.err_at(
self,
source,
@error_msgs[:lenrep],
start_pos,
[value])
end
precedence REPETITION
def to_s_inner(prec)
parslet.to_s(prec) + "{len:#{@len}}"
end
end
module Parslet::Atoms::DSL
def fixed(len)
Parslet::Atoms::FixedLength.new(self, len)
end
end