rubyparsingparslet

How to define a fixed-width constraint in parslet


I am looking into parslet to write alot of data import code. Overall, the library looks good, but I'm struggling with one thing. Alot of our input files are fixed width, and the widths differ between formats, even if the actual field doesn't. For example, we might get a file that has a 9-character currency, and another that has 11-characters (or whatever). Does anyone know how to define a fixed width constraint on a parslet atom?

Ideally, I would like to be able to define an atom that understands currency (with optional dollar signs, thousand separators, etc...) And then I would be able to, on the fly, create a new atom based on the old one that is exactly equivalent, except that it parses exactly N characters.

Does such a combinator exist in parslet? If not, would it be possible/difficult to write one myself?


Solution

  • What about something like this...

    class MyParser < Parslet::Parser
        def initialize(widths)
            @widths = widths
            super
        end
    
        rule(:currency)  {...}
        rule(:fixed_c)   {currency.fixed(@widths[:currency])}
    
    
        rule(:fixed_str) {str("bob").fixed(4)}
    end 
    
    puts MyParser.new.fixed_str.parse("bob").inspect
    

    This will fail with:

    "Expected 'bob' to be 4 long at line 1 char 1"
    

    Here's how you do it:

    require 'parslet'
    
    class Parslet::Atoms::FixedLength < Parslet::Atoms::Base  
      attr_reader :len, :parslet
      def initialize(parslet, len, tag=:length)
        super()
    
        raise ArgumentError, 
          "Asking for zero length of a parslet. (#{parslet.inspect} length #{len})" \
          if len == 0
    
        @parslet = parslet
        @len = len
        @tag = tag
        @error_msgs = {
          :lenrep  => "Expected #{parslet.inspect} to be #{len} long", 
          :unconsumed => "Extra input after last repetition"
        }
      end
    
      def try(source, context, consume_all)
        start_pos = source.pos
    
        success, value = parslet.apply(source, context, false)
    
        return succ(value) if success && value.str.length == @len
    
        context.err_at(
          self, 
          source, 
          @error_msgs[:lenrep], 
          start_pos, 
          [value]) 
      end
    
      precedence REPETITION
      def to_s_inner(prec)
        parslet.to_s(prec) + "{len:#{@len}}"
      end
    end
    
    module Parslet::Atoms::DSL
      def fixed(len)
        Parslet::Atoms::FixedLength.new(self, len)
      end
    end