[SOLVED] from user's language to the locale string

In our Django/Python/Linux stack we want to determine the correct locale from the user's language. The language might be 'de' and the locale could be something like de_DE or de_AT or even de_CH.UTF-8 - depending on what locale -a returns. In case of ambiguity we would just use the first valid entry for the respective language.

This locale string shall than be used to determine the correct number format for instance like so:

    locale.setlocale(locale.LC_ALL, user.language)
    formattedValue = locale.format_string(f'%.{decimals}f', val=gram, grouping=True)

We don't want to create a relation between language code and locale string in our application - so defining a dictionary containing languages and locale strings is not something we have in mind. The information should be retrieved generically.

For instance, on my local system I have these locales available:

stephan@rechenmonster:~$ locale -a
C
C.utf8
de_AT.utf8
de_BE.utf8
de_CH.utf8
de_DE.utf8
de_IT.utf8
de_LI.utf8
de_LU.utf8
en_AG
en_AG.utf8
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IL
en_IL.utf8
en_IN
en_IN.utf8
en_NG
en_NG.utf8
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZM
en_ZM.utf8
en_ZW.utf8
POSIX
stephan@rechenmonster:~$

In python many more are listed - not only the available ones:

>>> locale.locale_alias.keys()
dict_keys(['a3', 'a3_az', 'a3_az.koic', 'aa_dj', 'aa_er', 'aa_et', 'af', 'af_za', 'agr_pe', 'ak_gh', 'am', 'am_et', 'american', 'an_es', 'anp_in', 'ar', 'ar_aa', 'ar_ae', 'ar_bh', 'ar_dz', 'ar_eg', 'ar_in', 'ar_iq', 'ar_jo', 'ar_kw', 'ar_lb', 'ar_ly', 'ar_ma', 'ar_om', 'ar_qa', 'ar_sa', 'ar_sd', 'ar_ss', 'ar_sy', 'ar_tn', 'ar_ye', 'arabic', 'as', 'as_in', 'ast_es', 'ayc_pe', 'az', 'az_az', 'az_az.iso88599e', 'az_ir', 'be', 'be@latin', 'be_bg.utf8', 'be_by', 'be_by@latin', 'bem_zm', 'ber_dz', 'ber_ma', 'bg', 'bg_bg', 'bhb_in.utf8', 'bho_in', 'bho_np', 'bi_vu', 'bn_bd', 'bn_in', 'bo_cn', 'bo_in', 'bokmal', 'bokmål', 'br', 'br_fr', 'brx_in', 'bs', 'bs_ba', 'bulgarian', 'byn_er', 'c', 'c-french', 'c.ascii', 'c.en', 'c.iso88591', 'c.utf8', 'c_c', 'c_c.c', 'ca', 'ca_ad', 'ca_es', 'ca_es@valencia', 'ca_fr', 'ca_it', 'catalan', 'ce_ru', 'cextend', 'chinese-s', 'chinese-t', 'chr_us', 'ckb_iq', 'cmn_tw', 'crh_ua', 'croatian', 'cs', 'cs_cs', 'cs_cz', 'csb_pl', 'cv_ru', 'cy', 'cy_gb', 'cz', 'cz_cz', 'czech', 'da', 'da_dk', 'danish', 'dansk', 'de', 'de_at', 'de_be', 'de_ch', 'de_de', 'de_it', 'de_li.utf8', 'de_lu', 'deutsch', 'doi_in', 'dutch', 'dutch.iso88591', 'dv_mv', 'dz_bt', 'ee', 'ee_ee', 'eesti', 'el', 'el_cy', 'el_gr', 'el_gr@euro', 'en', 'en_ag', 'en_au', 'en_be', 'en_bw', 'en_ca', 'en_dk', 'en_dl.utf8', 'en_gb', 'en_hk', 'en_ie', 'en_il', 'en_in', 'en_ng', 'en_nz', 'en_ph', 'en_sc.utf8', 'en_sg', 'en_uk', 'en_us', 'en_us@euro@euro', 'en_za', 'en_zm', 'en_zw', 'en_zw.utf8', 'eng_gb', 'english', 'english.iso88591', 'english_uk', 'english_united-states', 'english_united-states.437', 'english_us', 'eo', 'eo.utf8', 'eo_eo', 'eo_us.utf8', 'eo_xx', 'es', 'es_ar', 'es_bo', 'es_cl', 'es_co', 'es_cr', 'es_cu', 'es_do', 'es_ec', 'es_es', 'es_gt', 'es_hn', 'es_mx', 'es_ni', 'es_pa', 'es_pe', 'es_pr', 'es_py', 'es_sv', 'es_us', 'es_uy', 'es_ve', 'estonian', 'et', 'et_ee', 'eu', 'eu_es', 'eu_fr', 'fa', 'fa_ir', 'fa_ir.isiri3342', 'ff_sn', 'fi', 'fi_fi', 'fil_ph', 'finnish', 'fo', 'fo_fo', 'fr', 'fr_be', 'fr_ca', 'fr_ch', 'fr_fr', 'fr_lu', 'français', 'fre_fr', 'french', 'french.iso88591', 'french_france', 'fur_it', 'fy_de', 'fy_nl', 'ga', 'ga_ie', 'galego', 'galician', 'gd', 'gd_gb', 'ger_de', 'german', 'german.iso88591', 'german_germany', 'gez_er', 'gez_et', 'gl', 'gl_es', 'greek', 'gu_in', 'gv', 'gv_gb', 'ha_ng', 'hak_tw', 'he', 'he_il', 'hebrew', 'hi', 'hi_in', 'hi_in.isciidev', 'hif_fj', 'hne', 'hne_in', 'hr', 'hr_hr', 'hrvatski', 'hsb_de', 'ht_ht', 'hu', 'hu_hu', 'hungarian', 'hy_am', 'hy_am.armscii8', 'ia', 'ia_fr', 'icelandic', 'id', 'id_id', 'ig_ng', 'ik_ca', 'in', 'in_id', 'is', 'is_is', 'iso-8859-1', 'iso-8859-15', 'iso8859-1', 'iso8859-15', 'iso_8859_1', 'iso_8859_15', 'it', 'it_ch', 'it_it', 'italian', 'iu', 'iu_ca', 'iu_ca.nunacom8', 'iw', 'iw_il', 'iw_il.utf8', 'ja', 'ja_jp', 'ja_jp.euc', 'ja_jp.mscode', 'ja_jp.pck', 'japan', 'japanese', 'japanese-euc', 'japanese.euc', 'jp_jp', 'ka', 'ka_ge', 'ka_ge.georgianacademy', 'ka_ge.georgianps', 'ka_ge.georgianrs', 'kab_dz', 'kk_kz', 'kl', 'kl_gl', 'km_kh', 'kn', 'kn_in', 'ko', 'ko_kr', 'ko_kr.euc', 'kok_in', 'korean', 'korean.euc', 'ks', 'ks_in', 'ks_in@devanagari.utf8', 'ku_tr', 'kw', 'kw_gb', 'ky', 'ky_kg', 'lb_lu', 'lg_ug', 'li_be', 'li_nl', 'lij_it', 'lithuanian', 'ln_cd', 'lo', 'lo_la', 'lo_la.cp1133', 'lo_la.ibmcp1133', 'lo_la.mulelao1', 'lt', 'lt_lt', 'lv', 'lv_lv', 'lzh_tw', 'mag_in', 'mai', 'mai_in', 'mai_np', 'mfe_mu', 'mg_mg', 'mhr_ru', 'mi', 'mi_nz', 'miq_ni', 'mjw_in', 'mk', 'mk_mk', 'ml', 'ml_in', 'mn_mn', 'mni_in', 'mr', 'mr_in', 'ms', 'ms_my', 'mt', 'mt_mt', 'my_mm', 'nan_tw', 'nb', 'nb_no', 'nds_de', 'nds_nl', 'ne_np', 'nhn_mx', 'niu_nu', 'niu_nz', 'nl', 'nl_aw', 'nl_be', 'nl_nl', 'nn', 'nn_no', 'no', 'no@nynorsk', 'no_no', 'no_no.iso88591@bokmal', 'no_no.iso88591@nynorsk', 'norwegian', 'nr', 'nr_za', 'nso', 'nso_za', 'ny', 'ny_no', 'nynorsk', 'oc', 'oc_fr', 'om_et', 'om_ke', 'or', 'or_in', 'os_ru', 'pa', 'pa_in', 'pa_pk', 'pap_an', 'pap_aw', 'pap_cw', 'pd', 'pd_de', 'pd_us', 'ph', 'ph_ph', 'pl', 'pl_pl', 'polish', 'portuguese', 'portuguese_brazil', 'posix', 'posix-utf2', 'pp', 'pp_an', 'ps_af', 'pt', 'pt_br', 'pt_pt', 'quz_pe', 'raj_in', 'ro', 'ro_ro', 'romanian', 'ru', 'ru_ru', 'ru_ua', 'rumanian', 'russian', 'rw', 'rw_rw', 'sa_in', 'sat_in', 'sc_it', 'sd', 'sd_in', 'sd_in@devanagari.utf8', 'sd_pk', 'se_no', 'serbocroatian', 'sgs_lt', 'sh', 'sh_ba.iso88592@bosnia', 'sh_hr', 'sh_hr.iso88592', 'sh_sp', 'sh_yu', 'shn_mm', 'shs_ca', 'si', 'si_lk', 'sid_et', 'sinhala', 'sk', 'sk_sk', 'sl', 'sl_cs', 'sl_si', 'slovak', 'slovene', 'slovenian', 'sm_ws', 'so_dj', 'so_et', 'so_ke', 'so_so', 'sp', 'sp_yu', 'spanish', 'spanish_spain', 'sq', 'sq_al', 'sq_mk', 'sr', 'sr@cyrillic', 'sr@latn', 'sr_cs', 'sr_cs.iso88592@latn', 'sr_cs@latn', 'sr_me', 'sr_rs', 'sr_rs@latn', 'sr_sp', 'sr_yu', 'sr_yu.cp1251@cyrillic', 'sr_yu.iso88592', 'sr_yu.iso88595', 'sr_yu.iso88595@cyrillic', 'sr_yu.microsoftcp1251@cyrillic', 'sr_yu.utf8', 'sr_yu.utf8@cyrillic', 'sr_yu@cyrillic', 'ss', 'ss_za', 'st', 'st_za', 'sv', 'sv_fi', 'sv_se', 'sw_ke', 'sw_tz', 'swedish', 'szl_pl', 'ta', 'ta_in', 'ta_in.tscii', 'ta_in.tscii0', 'ta_lk', 'tcy_in.utf8', 'te', 'te_in', 'tg', 'tg_tj', 'th', 'th_th', 'th_th.tactis', 'th_th.tis620', 'thai', 'the_np', 'ti_er', 'ti_et', 'tig_er', 'tk_tm', 'tl', 'tl_ph', 'tn', 'tn_za', 'to_to', 'tpi_pg', 'tr', 'tr_cy', 'tr_tr', 'ts', 'ts_za', 'tt', 'tt_ru', 'tt_ru.tatarcyr', 'tt_ru@iqtelif', 'turkish', 'ug_cn', 'uk', 'uk_ua', 'univ', 'universal', 'universal.utf8@ucs4', 'unm_us', 'ur', 'ur_in', 'ur_pk', 'uz', 'uz_uz', 'uz_uz@cyrillic', 've', 've_za', 'vi', 'vi_vn', 'vi_vn.tcvn', 'vi_vn.tcvn5712', 'vi_vn.viscii', 'vi_vn.viscii111', 'wa', 'wa_be', 'wae_ch', 'wal_et', 'wo_sn', 'xh', 'xh_za', 'yi', 'yi_us', 'yo_ng', 'yue_hk', 'yuw_pg', 'zh', 'zh_cn', 'zh_cn.big5', 'zh_cn.euc', 'zh_hk', 'zh_hk.big5hk', 'zh_sg', 'zh_sg.gbk', 'zh_tw', 'zh_tw.euc', 'zh_tw.euctw', 'zu', 'zu_za'])
>>>

So I guess it comes down to having the output of locale -a available in python. I've found now basically to ways of doing this:

running a subprocess of locale -a and fetching the output of the call
getting all locales in python with locale.locale_alias, filtering for the ones of the language in question and and testing each entry with locale.set_locale() and use the first one that does not raise an error.

Are these really the only two options? Is there no better way of doing this?

from user's language to the locale string - how to?