[SOLVED] What is the difference between C.UTF-8 and en

What is the difference between C.UTF-8 and en_US.UTF-8 locales?

I'm migrating a Python application from an Ubuntu server with a en_US.UTF-8 locale to a new Debian server which comes with C.UTF-8 already set by default. I'm trying to understand if there could be any impact from this change.

Solution

In general C is for computer, en_US is for people in the US who speak English (and other people who want the same behaviour).

The for computer means that the strings are sometimes more standardized (but still in English), so an output of a program could be read from another program. With en_US, strings could be improved, alphabetic order could be improved (maybe by new rules of Chicago rules of style, etc.). So more user-friendly, but possibly less stable. Note: locales are not just for translation of strings, but also for collation: alphabetic order, numbers (e.g. thousand separator), currency (I think it is safe to predict that $ and 2 decimal digits will remain), months, day of weeks, etc.

In your case, it is just the UTF-8 version of both locales.

In general it should not matter. I usually prefer en_US.UTF-8, but usually it doesn't matter, and in your case (server app), it should only change log and error messages (if you use locale.setlocale(). You should handle client locales inside your app. Programs that read from other programs should set C before opening the pipe, so it should not really matter.

As you see, probably it doesn't matter. You may also use the POSIX locale, also defined in Debian. You get the list of installed locales with locale -a.

Note: Micro-optimization will prescribe C/C.UTF-8 locale: no translation of files (gettext), and simple rules on collation and number formatting, but this should be visible only on the server side.