javaregex

java regex match up to a string


Regex'ers:

How can I construct a Java Regex to match Strings lexigraphically <= to a given date string?

For example, suppose the input is in YYYY-DD-MM format:

2014-01-20 MLK day
2007-04-14 'twas a very good day
2014-05-19 is today
1998-11-30 someone's birthday

I'd like the filter to return all lines before, say, Groundhog's day of this year, 2014-02-20; so in the above list the regex would return all lines except today. (I don't want to convert the dates to Epoch time; I'd like to just pass a Regex to a class that runs a map/reduce job so that my input record reader can use the Regex as it constructs bundles to deliver to the mappers.)

TIA,


Solution

  • It's near impossible to do <= type logic with regular expressions. You technically could, but you'd have to map out every possible scenario...and then if you want to change the date you are comparing to, the whole expression would change. Instead, I'd just match all the dates/values and then use a date parser to see if it less then the date. Here's an expression to get you started:

    (\d{4}-\d{2}-\d{2})\s+(.*)
    

    Then the date will be in capture group one. If it is <= Groundhog's day, then you have the value in capture group two.


    To show how complicated it is to do <= logic with regular expression, I whipped together a quick expression to match numbers > 0 and <= 27.

    ^([1-9]|1[0-9]|2[0-7])$
    

    As you can see, we pretty much need to map out each scenario. You can imagine how much more of a headache this would be with a date..and you wouldn't just be able to say "2014-02-02", you'd need to redo the majority of the expression.