In Chronic 0.9.1, when parsing Febr 2013
I'm getting a result June 2013
. Feb 2013
is parsed fine but Febr 2013
is not.
I think the issue is when the month abvreviation has four letters.
I need to:
Febr 2013
to February 2013
, orFebr 2013
.To validate a date I use:
Chronic.parse(params[:date]).blank?
Is this a bug? Can I do a work arround? Or, there is a right way to validate this?
Technically it's a bug, but I'm more inclined to call it a hole in their logic. Here's how Chronic::Repeater.scan_for_month_names decides what a month is:
# File 'lib/chronic/repeater.rb', line 38
def self.scan_for_month_names(token)
scan_for token, RepeaterMonthName,
{
/^jan[:\.]?(uary)?$/ => :january,
/^feb[:\.]?(ruary)?$/ => :february,
/^mar[:\.]?(ch)?$/ => :march,
/^apr[:\.]?(il)?$/ => :april,
/^may$/ => :may,
/^jun[:\.]?e?$/ => :june,
/^jul[:\.]?y?$/ => :july,
/^aug[:\.]?(ust)?$/ => :august,
/^sep[:\.]?(t[:\.]?|tember)?$/ => :september,
/^oct[:\.]?(ober)?$/ => :october,
/^nov[:\.]?(ember)?$/ => :november,
/^dec[:\.]?(ember)?$/ => :december
}
end
Month names are either three letters, or the entire name.
You could extract that method from the source, modify the patterns to fit your needs, then overwrite that method, along with submitting it as a patch so the tweak gets added to future revisions of the gem. Or, you could modify the incoming string by searching for the three-letter abbreviations at the beginning of a word, and trimming extraneous characters.
OK, here's something to chew on:
require 'abbrev'
MONTHS = %w[
january
february
march
april
may
june
july
august
september
october
november
december
]
MONTHS_ABBREV = Abbrev.abbrev(MONTHS)
MONTHS_REGEX = /\b(?:j(?:a(?:n(?:u(?:a(?:ry?)?)?)?)?|u(?:ly?|ne?))|s(?:e(?:p(?:t(?:e(?:m(?:b(?:er?)?)?)?)?)?)?)?|a(?:u(?:g(?:u(?:st?)?)?)?|p(?:r(?:il?)?)?)|d(?:e(?:c(?:e(?:m(?:b(?:er?)?)?)?)?)?)?|f(?:e(?:b(?:r(?:u(?:a(?:ry?)?)?)?)?)?)?|n(?:o(?:v(?:e(?:m(?:b(?:er?)?)?)?)?)?)?|o(?:c(?:t(?:o(?:b(?:er?)?)?)?)?)?|ma(?:r(?:ch?)?|y))\b/i
%w[j ja jan janu january f fe feb febr february].each do |m|
puts "#{ m } => #{ MONTHS_ABBREV[m[MONTHS_REGEX]] }"
end
Which outputs:
j =>
ja => january
jan => january
janu => january
january => january
f => february
fe => february
feb => february
febr => february
february => february
In other words, j
isn't unique, so there isn't a hit. ja
is unique and is associated with january
, as are the rest of the ja...
tests. f
is unique so it hits, as do all the rest of the f...
tests.
What does Abbrev.abbrev
do? It breaks the words passed in, into the minimum unique strings used to identify the whole word. Here's what it looks like if I only use four months:
require 'abbrev'
MONTHS = %w[
march
may
june
july
]
MONTHS_ABBREV = Abbrev.abbrev(MONTHS)
pp MONTHS_ABBREV
Resulting in:
{"marc"=>"march",
"mar"=>"march",
"jun"=>"june",
"jul"=>"july",
"march"=>"march",
"may"=>"may",
"june"=>"june",
"july"=>"july"}
Those make wonderful seed values for a regular expression.
Where did I get MONTHS_REGEX
? Heh... it's some magical Perl code using a little known module called Regexp::Assemble, that I dearly miss in Ruby. It's skanky... no, it's... diabolically good and closely tied to how Perl does things, and makes my head hurt when I read through it, otherwise I'd have ported it.