javalocalesimpledateformatazul-zulu

different results with same locale using SimpleDateFormat on same machine with Windows zulu8


I have to deals with SimpleDateFormat but I have issue with year of week values.

To narrow down the problem, I wrote the simple Java code below and found that it returns two different results with apparently the same settings (just by forcing local on command line). The problem is only with a Windows (US configured) machine: if I run the same test on a Linux (CentOS) machine, everything is ok.

JVM on Windows is zulu8 1.8.0_282 openjdk (but it seems I've the same behavior with oracle 8 jdk) while it's Red Hat 1.8.0_272 openjdk on Linux.

Here is the source code :

import java.util.Locale;
import java.util.Calendar;
import java.util.TimeZone;
import java.text.SimpleDateFormat;
import java.text.DateFormat;
import java.text.ParseException;

import java.time.LocalDate;
import java.time.temporal.WeekFields;

public class TestDate {
    public static void main(String args[]) throws ParseException {
        Locale currentLocale = Locale.getDefault();

        System.out.println(System.getProperty("java.vendor"));
        System.out.println(System.getProperty("java.version"));
        System.out.println("==============");
        System.out.printf("%20s = %s%n", "getDisplayLanguage", currentLocale.getDisplayLanguage());
        System.out.printf("%20s = %s%n", "getDisplayCountry", currentLocale.getDisplayCountry());
        System.out.printf("%20s = %s%n", "getDisplayVariant", currentLocale.getDisplayVariant());

        System.out.printf("%20s = %s%n", "getLanguage", currentLocale.getLanguage());
        System.out.printf("%20s = %s%n", "getCountry", currentLocale.getCountry());

        System.out.printf("%20s = %s%n", "user.country", System.getProperty("user.country"));
        System.out.printf("%20s = %s%n", "user.language", System.getProperty("user.language"));
        System.out.printf("%20s = %s%n", "user.variant", System.getProperty("user.variant"));

        System.out.println("==============");

        Calendar c = Calendar.getInstance();
        System.out.println("1st day of week / minimal days in 1st week : " + c.getFirstDayOfWeek() + " / " + c.getMinimalDaysInFirstWeek());

        System.out.println("==============");

        LocalDate date1 = LocalDate.of(2020, 12, 31);
        LocalDate date2 = LocalDate.of(2021, 1, 1);

        DateFormat df_date = new java.text.SimpleDateFormat("dd/MM/yyyy");
        DateFormat df_week = new java.text.SimpleDateFormat("YYYY-ww");

        System.out.printf("%20s | %10s | %10s%n", "", df_date.format(java.sql.Date.valueOf(date1)), df_date.format(java.sql.Date.valueOf(date2)));
        System.out.printf("%20s | %10s | %10s%n", "SimpleDateFormat", df_week.format(java.sql.Date.valueOf(date1)), df_week.format(java.sql.Date.valueOf(date2)));

        System.out.printf("%20s | %7d-%02d | %7d-%02d%n", "WeekFields",
                                        date1.get(WeekFields.ISO.weekBasedYear()), date1.get(WeekFields.ISO.weekOfWeekBasedYear()),
                                        date2.get(WeekFields.ISO.weekBasedYear()), date2.get(WeekFields.ISO.weekOfWeekBasedYear()));

    }
}

And here are the results (the second one is the expected one):

>java TestDate
Azul Systems, Inc.
1.8.0_282
==============
  getDisplayLanguage = English
   getDisplayCountry = United States
   getDisplayVariant =
         getLanguage = en
          getCountry = US
        user.country = US
       user.language = en
        user.variant =
==============
1st day of week / minimal days in 1st week : 2 / 4
==============
                     | 31/12/2020 | 01/01/2021
    SimpleDateFormat |    2020-53 |    2020-53
          WeekFields |    2020-53 |    2020-53

>java -Duser.language=en -Duser.country=US -Duser.variant= TestDate
Azul Systems, Inc.
1.8.0_282
==============
  getDisplayLanguage = English
   getDisplayCountry = United States
   getDisplayVariant =
         getLanguage = en
          getCountry = US
        user.country = US
       user.language = en
        user.variant =
==============
1st day of week / minimal days in 1st week : 1 / 1
==============
                     | 31/12/2020 | 01/01/2021
    SimpleDateFormat |    2021-01 |    2021-01
          WeekFields |    2020-53 |    2020-53

Both seems to use the same locale settings but SimpleDateFormat returns different week/year of week. Am I missing some locale settings?

Thank you for your help.

EDIT with Oracle JDK :

>java TestDate
Oracle Corporation
1.8.0_202
==============
  getDisplayLanguage = English
   getDisplayCountry = United States
   getDisplayVariant =
         getLanguage = en
          getCountry = US
        user.country = US
       user.language = en
        user.variant =
==============
1st day of week / minimal days in 1st week : 2 / 4
==============
                     | 31/12/2020 | 01/01/2021
    SimpleDateFormat |    2020-53 |    2020-53
          WeekFields |    2020-53 |    2020-53

>java -Duser.language=en -Duser.country=US -Duser.variant= TestDate
Oracle Corporation
1.8.0_202
==============
  getDisplayLanguage = English
   getDisplayCountry = United States
   getDisplayVariant =
         getLanguage = en
          getCountry = US
        user.country = US
       user.language = en
        user.variant =
==============
1st day of week / minimal days in 1st week : 1 / 1
==============
                     | 31/12/2020 | 01/01/2021
    SimpleDateFormat |    2021-01 |    2021-01
          WeekFields |    2020-53 |    2020-53

EDIT Calendar default Locale : As pointed out by Scratte, Calendar and SimpleDateFormat use a default Locale. I had a look on SimpleDateFormat source code and it uses Locale.getDefault(Locale.Category.FORMAT) as default Local which turns out to be different from the Locale.getDefault() I used in my code.

I finally have understood why I had 2 different behavior between both code: I did not display the correct Locale (I was not aware of the 3 distincts Locale ; thank you Ole V.V. for clarifying this).

TL;DR

SimpleDateFormat uses Locale.getDefault(Locale.Category.FORMAT) and my Java code was displaying values of Locale.getDefault(). The later was always en_US but the former was fr_FR or en_US depending on the command line I used. That's why I had two different output for the week / year.

Finally, JVM parameters -Duser.language= / -Duser.country= / -Duser.variant= are the solution (they force all the three different Locale)!

This new code shows the difference of the three different Locale:

import java.sql.Date;
import java.util.Locale;
import java.util.Calendar;
import java.util.TimeZone;
import java.text.SimpleDateFormat;
import java.text.DateFormat;
import java.text.ParseException;

import java.time.LocalDate;
import java.time.temporal.WeekFields;

public class TestDate {
    public static void main(String args[]) throws ParseException {
        Locale cL = Locale.getDefault();
        Locale cLD = Locale.getDefault(Locale.Category.DISPLAY);
        Locale cLF = Locale.getDefault(Locale.Category.FORMAT);

        System.out.println(System.getProperty("java.vendor"));
        System.out.println(System.getProperty("java.version"));
        System.out.println("==============");
        System.out.printf("%20s | %15s | %15s | %15s%n", "Locale.getDefault(.)", "", "DISPLAY", "FORMAT");
        System.out.printf("%20s | %15s | %15s | %15s%n", "getDisplayLanguage", cL.getDisplayLanguage(), cLD.getDisplayLanguage(), cLF.getDisplayLanguage());
        System.out.printf("%20s | %15s | %15s | %15s%n", "getDisplayCountry", cL.getDisplayCountry(), cLD.getDisplayCountry(), cLF.getDisplayCountry());
        System.out.printf("%20s | %15s | %15s | %15s%n", "getDisplayVariant", cL.getDisplayVariant(), cLD.getDisplayVariant(), cLF.getDisplayVariant());
        System.out.printf("%20s | %15s | %15s | %15s%n", "getLanguage", cL.getLanguage(), cLD.getLanguage(), cLF.getLanguage());
        System.out.printf("%20s | %15s | %15s | %15s%n", "getCountry", cL.getCountry(), cLD.getCountry(), cLF.getCountry());
        System.out.printf("%20s | %15s | %15s | %15s%n", "getVariant", cL.getVariant(), cLD.getVariant(), cLF.getVariant());

        System.out.printf("%20s = %s%n", "user.country", System.getProperty("user.country"));
        System.out.printf("%20s = %s%n", "user.language", System.getProperty("user.language"));
        System.out.printf("%20s = %s%n", "user.variant", System.getProperty("user.variant"));

        System.out.println("==============");

        Calendar c = Calendar.getInstance();
        System.out.println("1st day of week / minimal days in 1st week : " + c.getFirstDayOfWeek() + " / " + c.getMinimalDaysInFirstWeek());

        System.out.println("==============");

        LocalDate date1 = LocalDate.of(2020, 12, 31);
        LocalDate date2 = LocalDate.of(2021, 1, 1);

        DateFormat df_date = new java.text.SimpleDateFormat("dd/MM/yyyy");
        DateFormat df_week = new java.text.SimpleDateFormat("YYYY-ww");

        System.out.printf("%20s | %10s | %10s%n", "", df_date.format(java.sql.Date.valueOf(date1)), df_date.format(java.sql.Date.valueOf(date2)));
        System.out.printf("%20s | %10s | %10s%n", "SimpleDateFormat", df_week.format(java.sql.Date.valueOf(date1)), df_week.format(java.sql.Date.valueOf(date2)));

        System.out.printf("%20s | %7d-%02d | %7d-%02d%n", "WeekFields",
                                        date1.get(WeekFields.ISO.weekBasedYear()), date1.get(WeekFields.ISO.weekOfWeekBasedYear()),
                                        date2.get(WeekFields.ISO.weekBasedYear()), date2.get(WeekFields.ISO.weekOfWeekBasedYear()));

    }
}

And the corresponding outputs :

>java TestDate
Azul Systems, Inc.
1.8.0_282
==============
Locale.getDefault(.) |                 |         DISPLAY |          FORMAT
  getDisplayLanguage |         English |         English |          French
   getDisplayCountry |   United States |   United States |          France
   getDisplayVariant |                 |                 |
         getLanguage |              en |              en |              fr
          getCountry |              US |              US |              FR
          getVariant |                 |                 |
        user.country = US
       user.language = en
        user.variant =
==============
1st day of week / minimal days in 1st week : 2 / 4
==============
                     | 31/12/2020 | 01/01/2021
    SimpleDateFormat |    2020-53 |    2020-53
          WeekFields |    2020-53 |    2020-53
>java -Duser.language=en -Duser.country=US -Duser.variant= TestDate
Azul Systems, Inc.
1.8.0_282
==============
Locale.getDefault(.) |                 |         DISPLAY |          FORMAT
  getDisplayLanguage |         English |         English |         English
   getDisplayCountry |   United States |   United States |   United States
   getDisplayVariant |                 |                 |
         getLanguage |              en |              en |              en
          getCountry |              US |              US |              US
          getVariant |                 |                 |
        user.country = US
       user.language = en
        user.variant =
==============
1st day of week / minimal days in 1st week : 1 / 1
==============
                     | 31/12/2020 | 01/01/2021
    SimpleDateFormat |    2021-01 |    2021-01
          WeekFields |    2020-53 |    2020-53

Solution

  • I have not understood how the implementation by Talend ETL can be any of your business. If they have not yet found the opportunity for upgrading to java.time, the modern Java date and time API, it’s their problem, not yours. You should not use SimpleDateFormat nor Calendar in your own code.

    Java has got 3 default locales

    Java hasn’t just got one, it’s got three default locales, partly for historical reasons. They can be set individually. To demonstrate:

        Locale.setDefault(Locale.FRANCE);
        Locale.setDefault(Locale.Category.DISPLAY, Locale.JAPAN);
        Locale.setDefault(Locale.Category.FORMAT, Locale.GERMANY);
        
        System.out.println(Locale.getDefault());
        System.out.println(Locale.getDefault(Locale.Category.DISPLAY));
        System.out.println(Locale.getDefault(Locale.Category.FORMAT));
    

    Output from this snippet is:

    fr_FR
    ja_JP
    de_DE
    

    The output reflects in order France, Japan and Germany (deutsch/Deutschland).

    Your comment states that the code of SimpleDateFormat uses the default FORMAT locale as its default locale (so Germany in my example). That is, the locale that it uses when you don’t specify one (you should’t use SimpleDateFormat, if you do nevertheless, you should always specify locale explicitly).

    As I said, the three can be set individually. The one-arg Locale.setDefault() sets all three, though.

    Does this observation explain? On my Java 11 it seems that setting the locale on the command line sets all three default locales (until altered by Locale.setDefault()). I tried just

        System.out.println(Locale.getDefault());
        System.out.println(Locale.getDefault(Locale.Category.DISPLAY));
        System.out.println(Locale.getDefault(Locale.Category.FORMAT));
    

    I ran this snippet with -Duser.language=en -Duser.country=US on the command line, and the output was:

    en_US
    en_US
    en_US
    

    Also other language and country setting came through in all three locales. So no, this doesn’t alone explain why your SimpleDateFormat in one case did not seem to pick up the locale from the command line.

    Does this observation provide a solution?

    I still haven’t understood what your real end goal is. The first recommendation is: Your code should not rely on the default locale of the JVM. Use explicit locale in your locale sensitive operations.

    If you do need to set the default FORMAT locale for Talend ETL to work the way you require it to, Locale.setDefault(Locale.Category.FORMAT, Locale.US); should do it.

    Link

    Related question: Which "default Locale" is which?