javawekasimpledateformatarff

WEKA parses my date with SimpleDateFormat....unless it involves 2 o' clock


I have a large ARFF file with data that looks something like this:

555,"2011-03-13 01:50:48.000",0
540,"2011-03-13 02:10:19.000",0

To help parse it, I declared the second attribute like this:

@attribute RecordedOn date "yyyy-MM-dd HH:mm:ss.SSS"

The parser, which uses Java's SimpleDateFormat, works fine for the first line (and the couple million lines that are very similar to it), but chokes on a few lines, like the second one. I've noticed that it only chokes one lines whose hour is "02"--in fact, the second line is parsed fine if I change it to 540,"2011-03-13 01:10:19.000",0. To add to the mystery, some lines with a 02 are parsed fine anyway. Like: 1,"2006-12-16 02:58:51.000",111

So does anyone know what's happening? Any advice? Thanks in advance.


Solution

  • You are almost certainly interpreting the dates as local times in a time zone that observes Daylight Saving Time. March 13, 2011 was the start of Daylight Saving Time in the United States; this means the clock advances from 01:59:59 to 03:00:00, skipping the entire 2 o'clock hour. "2011-03-13 02:10:19.000" local time never occurred in, e.g., New York City.