ruby-on-railsrubychronic

Why is Chronic incorrectly parsing for four letter months?


In Chronic 0.9.1, when parsing Febr 2013 I'm getting a result June 2013. Feb 2013 is parsed fine but Febr 2013 is not.

I think the issue is when the month abvreviation has four letters.

I need to:

To validate a date I use:

Chronic.parse(params[:date]).blank?

Is this a bug? Can I do a work arround? Or, there is a right way to validate this?


Solution

  • Technically it's a bug, but I'm more inclined to call it a hole in their logic. Here's how Chronic::Repeater.scan_for_month_names decides what a month is:

    # File 'lib/chronic/repeater.rb', line 38
    
    def self.scan_for_month_names(token)
      scan_for token, RepeaterMonthName,
      {
        /^jan[:\.]?(uary)?$/ => :january,
        /^feb[:\.]?(ruary)?$/ => :february,
        /^mar[:\.]?(ch)?$/ => :march,
        /^apr[:\.]?(il)?$/ => :april,
        /^may$/ => :may,
        /^jun[:\.]?e?$/ => :june,
        /^jul[:\.]?y?$/ => :july,
        /^aug[:\.]?(ust)?$/ => :august,
        /^sep[:\.]?(t[:\.]?|tember)?$/ => :september,
        /^oct[:\.]?(ober)?$/ => :october,
        /^nov[:\.]?(ember)?$/ => :november,
        /^dec[:\.]?(ember)?$/ => :december
      }
    end
    

    Month names are either three letters, or the entire name.

    You could extract that method from the source, modify the patterns to fit your needs, then overwrite that method, along with submitting it as a patch so the tweak gets added to future revisions of the gem. Or, you could modify the incoming string by searching for the three-letter abbreviations at the beginning of a word, and trimming extraneous characters.


    OK, here's something to chew on:

    require 'abbrev'
    
    MONTHS = %w[
      january
      february
      march
      april
      may
      june
      july
      august
      september
      october
      november
      december
    ]
    
    MONTHS_ABBREV = Abbrev.abbrev(MONTHS)
    MONTHS_REGEX = /\b(?:j(?:a(?:n(?:u(?:a(?:ry?)?)?)?)?|u(?:ly?|ne?))|s(?:e(?:p(?:t(?:e(?:m(?:b(?:er?)?)?)?)?)?)?)?|a(?:u(?:g(?:u(?:st?)?)?)?|p(?:r(?:il?)?)?)|d(?:e(?:c(?:e(?:m(?:b(?:er?)?)?)?)?)?)?|f(?:e(?:b(?:r(?:u(?:a(?:ry?)?)?)?)?)?)?|n(?:o(?:v(?:e(?:m(?:b(?:er?)?)?)?)?)?)?|o(?:c(?:t(?:o(?:b(?:er?)?)?)?)?)?|ma(?:r(?:ch?)?|y))\b/i
    
    %w[j ja jan janu january f fe feb febr february].each do |m|
      puts "#{ m } => #{ MONTHS_ABBREV[m[MONTHS_REGEX]] }" 
    end
    

    Which outputs:

    j =>
    ja => january
    jan => january
    janu => january
    january => january
    f => february
    fe => february
    feb => february
    febr => february
    february => february
    

    In other words, j isn't unique, so there isn't a hit. ja is unique and is associated with january, as are the rest of the ja... tests. f is unique so it hits, as do all the rest of the f... tests.

    What does Abbrev.abbrev do? It breaks the words passed in, into the minimum unique strings used to identify the whole word. Here's what it looks like if I only use four months:

    require 'abbrev'
    
    MONTHS = %w[
      march
      may
      june
      july
    ]
    
    MONTHS_ABBREV = Abbrev.abbrev(MONTHS)
    pp MONTHS_ABBREV
    

    Resulting in:

    {"marc"=>"march",
     "mar"=>"march",
     "jun"=>"june",
     "jul"=>"july",
     "march"=>"march",
     "may"=>"may",
     "june"=>"june",
     "july"=>"july"}
    

    Those make wonderful seed values for a regular expression.

    Where did I get MONTHS_REGEX? Heh... it's some magical Perl code using a little known module called Regexp::Assemble, that I dearly miss in Ruby. It's skanky... no, it's... diabolically good and closely tied to how Perl does things, and makes my head hurt when I read through it, otherwise I'd have ported it.