javaregexdatesimpledateformatjava-6

Generic support for ISO 8601 format in Java 6


Java 7 has introduced support in the SimpleDateFormat class for ISO 8601 format, via the character X (instead of lower or upper case Z). Supporting such formats in Java 6 requires preprocessing, so the best approach is the question.

This new format is a superset of Z (uppercase Z), with 2 additional variations:

  1. The "minutes" field is optional (i.e., 2-digit instead of 4-digit timezones are valid)
  2. A colon character (':') can be used for separating the 2-digit "hours" field from the 2-digit "minutes" field).

So, as one can observe from the Java 7 documentation of SimpleDateFormat, the following 3 formats are now valid (instead of only the second one covered by Z in Java 6) and, of course, equivalent:

  1. -08
  2. -0800
  3. -08:00

As discussed in an earlier question about a special case of supporting such an "expanded" timezone format, always with ':' as a separator, the best approach for backporting the Java 7 functionality into Java 6 is to subclass the SimpleDateformat class and override its parse() method, i.e:

public Date parse(String date, ParsePosition pos)
{
    String iso = ... // Replace the X with a Z timezone string, using a regex

    if (iso.length() == date.length())
    {
        return null; // Not an ISO 8601 date
    }

    Date parsed = super.parse(iso, pos);

    if (parsed != null)
    {
        pos.setIndex(pos.getIndex()+1); // Adjust for ':'
    }

    return parsed;
}

Note that the subclassed SimpleDateFormat objects above must be initialized with the corresponding Z-based pattern, i.e. if the subclass is ExtendedSimpleDateformat and you want to parse dates complying to the pattern yyyy-MM-dd'T'HH:mm:ssX, then you should use objects instantiated as

new ExtendedSimpleDateFormat("yyyy-MM-dd'T'HH:mm:ssZ");

In the aforementioned earlier question the regex :(?=[0-9]{2}$) has been suggested for getting rid of the ':' and in a similar question the regex (?<=[+-]\d{2})$ has been suggested for appending the "minute" field as 00, if needed.

Obviously, running the 2 replacements successfully can be used for achieving full functionality. So, the iso local variable in the overridden parse() method would be set as

iso = date.replaceFirst(":(?=[0-9]{2}$)","");

or

iso = iso.replaceFirst("(?<=[+-]\\d{2})$", "00");

with an if check in between to make sure that the pos value is also set properly later on and also for the length() comparison earlier.

The question is: can we use a single regular expression to achieve the same effect, including the information needed for not unnecessarily checking the length and for correctly setting pos a few lines later?

The implementation is intended for code that reads very large numbers of string fields that can be in any format (even totally non-date), selects only those which comply to the format and returns the parsed Java Date object.

So, both accuracy and speed are of paramount importance (i.e., if using the 2 passes is faster, this approach is preferrable).


Solution

  • Seems that you can use this:

    import java.util.Calendar;
    import javax.xml.bind.DatatypeConverter;
    
    public class TestISO8601 {
        public static void main(String[] args) {
            parse("2012-10-01T19:30:00+02:00"); // UTC+2
            parse("2012-10-01T19:30:00Z");      // UTC
            parse("2012-10-01T19:30:00");       // Local
        }
        public static Date parse(final String str) {
            Calendar c = DatatypeConverter.parseDateTime(str);
            System.out.println(str + "\t" + (c.getTime().getTime()/1000));
            return c.getTime();
        }
    }