javajava-timedatetimeformatter

Date Time Formatter use same pattern but different output with similar date string format


I have multiple files which contains many date time strings. Example :

File 1 :
string date = 30 Nov 2023, 18:15:53
format = dd MMM yyyy, HH:mm:ss
result => success formatted (2023-11-30T22:46:48)

File 2 :
string date = 10 Dec 2023, 20:23:53
format = dd MMM yyyy, HH:mm:ss
result => Text '10 Dec 2023, 20:23:53' could not be parsed at index 3

DateTimeFormatter dFormatter = DateTimeFormatter.ofPattern(formatDate);
LocalDateTime localDateTime = LocalDateTime.parse(dateTimeString, dFormatter);

Why does it produce different results ? And how do I solve this ?


Solution

  • Locale

    When parsing localized formats such as yours, a Locale object determines the human language and cultural norms used in translating name of month, punctuation, etc.

    If you choose to not specify a Locale, the JVM’s current default locale is used implicitly. If you have a specific locale in mind, I suggest always specifying that Locale object explicitly rather than rely upon the default.

    Locale locale = Locale.US ;
    DateTimeFormatter f = 
        DateTimeFormatter
            .ofPattern( "dd MMM yyyy, HH:mm:ss" )
            .withLocale( locale ) ;  // ⬅️ Specify locale.
    LocalDateTime ldt = LocalDateTime.parse( input , f ) ;
    

    Locale changes

    As Commented by DuncG, human language and culture evolves, changing over time. That means the rules for a Locale will change over time.

    In modern Java, the default source for these rules is from the Common Locale Data Repository (CLDR) published by the Unicode Consortium. Updates to Java come with updates in the CLDR. These updates mean that localized text generated will change. Likewise, Java’s expectations when parsing localized text will change.

    Those facts up to this: Your code parsing localized text may well break in future updates. For example, in the past, Locale.UK produced/expected a three-letter abbreviation of Sep for September, while more recent versions expect a four-letter Sept. Result: Your code breaks.

    Solution: Stop exchanging date-time values as localized text. Use standard date-time formats only for data exchange, data storage, and logging.

    Also, do not write tests that expect to-the-letter perfect matches for localized text. Those tests will fail as CLDR rules evolve. If your code depends on to-the-letter perfect matches, your code is unrealistic and should be changed. Where you need to-the-letter perfect matches, use standard text formats.

    ISO 8601

    The ideal solution would be educating the publisher of your data about using only standard ISO 8601 formats when exchanging date-time values as text. Localized formats should be used only for presentation to the user, never for data storage nor data exchange.

    The java.time classes use ISO 8601 formats by default when parsing/generating text. So no need to specify a formatting pattern.

    String input = "2023-12-10T20:23:53" ;
    LocalDateTime ldt = LocalDateTime.parse( input ) ;