Java 7 has introduced support in the SimpleDateFormat
class for ISO 8601 format, via the character X
(instead of lower or upper case Z
). Supporting such formats in Java 6 requires preprocessing, so the best approach is the question.
This new format is a superset of Z
(uppercase Z), with 2 additional variations:
So, as one can observe from the Java 7 documentation of SimpleDateFormat
, the following 3 formats are now valid (instead of only the second one covered by Z
in Java 6) and, of course, equivalent:
As discussed in an earlier question about a special case of supporting such an "expanded" timezone format, always with ':' as a separator, the best approach for backporting the Java 7 functionality into Java 6 is to subclass the SimpleDateformat
class and override its parse()
method, i.e:
public Date parse(String date, ParsePosition pos)
{
String iso = ... // Replace the X with a Z timezone string, using a regex
if (iso.length() == date.length())
{
return null; // Not an ISO 8601 date
}
Date parsed = super.parse(iso, pos);
if (parsed != null)
{
pos.setIndex(pos.getIndex()+1); // Adjust for ':'
}
return parsed;
}
Note that the subclassed SimpleDateFormat
objects above must be initialized with the corresponding Z
-based pattern, i.e. if the subclass is ExtendedSimpleDateformat
and you want to parse dates complying to the pattern yyyy-MM-dd'T'HH:mm:ssX
, then you should use objects instantiated as
new ExtendedSimpleDateFormat("yyyy-MM-dd'T'HH:mm:ssZ");
In the aforementioned earlier question the regex :(?=[0-9]{2}$)
has been suggested for getting rid of the ':' and in a similar question the regex (?<=[+-]\d{2})$
has been suggested for appending the "minute" field as 00
, if needed.
Obviously, running the 2 replacements successfully can be used for achieving full functionality. So, the iso
local variable in the overridden parse()
method would be set as
iso = date.replaceFirst(":(?=[0-9]{2}$)","");
or
iso = iso.replaceFirst("(?<=[+-]\\d{2})$", "00");
with an if
check in between to make sure that the pos
value is also set properly later on and also for the length()
comparison earlier.
The question is: can we use a single regular expression to achieve the same effect, including the information needed for not unnecessarily checking the length and for correctly setting pos
a few lines later?
The implementation is intended for code that reads very large numbers of string fields that can be in any format (even totally non-date), selects only those which comply to the format and returns the parsed Java Date
object.
So, both accuracy and speed are of paramount importance (i.e., if using the 2 passes is faster, this approach is preferrable).
Seems that you can use this:
import java.util.Calendar;
import javax.xml.bind.DatatypeConverter;
public class TestISO8601 {
public static void main(String[] args) {
parse("2012-10-01T19:30:00+02:00"); // UTC+2
parse("2012-10-01T19:30:00Z"); // UTC
parse("2012-10-01T19:30:00"); // Local
}
public static Date parse(final String str) {
Calendar c = DatatypeConverter.parseDateTime(str);
System.out.println(str + "\t" + (c.getTime().getTime()/1000));
return c.getTime();
}
}