rlatexmultilingualpandocietf-bcp-47

Is there a list of BCP 47 language codes in R?


I'm running the fantastic pandoc from within an R package, relying on the LaTeX babel package for some typesetting niceties.

Pandoc expects a lang argument as a BCP 47 code (e.g. en-US), but babel expects its own language codes (e.g. american).

Pandoc, being as awesome as it is, maps between the two in this haskell script.

In the spirit of defensive programming, I'd like to warn my users when they're using a wrong language code, and give them a definitive list of such acceptable BCP 47 codes.

Does such a list (or vector, or whatever) exist somewhere in R or a package for convenient use?

I'm trying to avoid manually typing up the pandoc haskell script.

Related links:

https://www.w3.org/International/articles/language-tags/

https://search.r-project.org/CRAN/refmans/NLP/html/language.html


Solution

  • I needed to provide a convenient selectize input, so I had to have the available options ready in R and ended up hand-copying them (yikes).

    In case anyone finds this useful, here are: - language codes in short (lang_short), - variant or locale (var_short), - a longer version of the language (helpful for input) lang_long (possibly non-standard!), - a longer version of the variant or locale (helpful for input) var_long (probably non-standard!), - logical values for polyglossia and babel, indicating whether pandoc maps to one or both of these (might come in handy if you need to rely on only one of these LaTeX packages.

    Remember that pandoc expects languages of the form en_US etc., so you need to paste column 1 and 2.

    Remember that these are not all languages and variants under the BCP 47 standard; it's just the (small) subset mapped by pandoc.

    (If anyone comes across a more definitive list of language codes in R, that would be great).


    lang_short;var_short;lang_long;var_long;polyglossia;babel
    ar;DZ;arabic;Algeria;TRUE;FALSE
    ar;IQ;arabic;Iraq;TRUE;FALSE
    ar;JO;arabic;Jordan;TRUE;FALSE
    ar;LB;arabic;Lebanon;TRUE;FALSE
    ar;LY;arabic;Libya;TRUE;FALSE
    ar;MA;arabic;Morocco;TRUE;FALSE
    ar;MR;arabic;Mauritania;TRUE;FALSE
    ar;PS;arabic;Palestinian Territory;TRUE;FALSE
    ar;SY;arabic;Syria;TRUE;FALSE
    ar;TN;arabic;Tunisia;TRUE;FALSE
    de;DE;german;;TRUE;TRUE
    de;AT;german;Austria;TRUE;TRUE
    de;CH;german;Switzerland;TRUE;TRUE
    dsb;;lower sorbian;;TRUE;FALSE
    hsb;;upper sorbian;;FALSE;TRUE
    el;polyton;greek;polytonic;TRUE;TRUE
    en;AU;english;Australia;TRUE;TRUE
    en;CA;english;Canada;TRUE;TRUE
    en;GB;english;Great Britain;TRUE;TRUE
    en;NZ;english;New Zealand;TRUE;TRUE
    en;UK;english;United Kingdom;TRUE;TRUE
    en;US;english;United States;TRUE;TRUE
    grc;ancient;greek;ancient;TRUE;TRUE
    la;;latin;;TRUE;TRUE
    sl;;slovenian;;TRUE;TRUE
    fr;CA;french;Canada;FALSE;TRUE
    pt;BR;portoguese;Brazil;TRUE;TRUE
    sr;;serbian;;TRUE;TRUE
    af;;afrikaans;;TRUE;TRUE
    am;;amharic;;TRUE;TRUE
    ar;;arabic;;TRUE;TRUE
    as;;assamese;;TRUE;TRUE
    ast;;asturian;;TRUE;TRUE
    bg;;bulgarian;;TRUE;TRUE
    bn;;bengali;;TRUE;TRUE
    bo;;tibetan;;TRUE;TRUE
    br;;breton;;TRUE;TRUE
    ca;;catalan;;TRUE;TRUE
    cy;;welsh;;TRUE;TRUE
    cs;;czech;;TRUE;TRUE
    cop;;coptic;;TRUE;TRUE
    da;;danish;;TRUE;TRUE
    dv;;divehi;;TRUE;TRUE
    el;;greek;;TRUE;TRUE
    en;;english;;TRUE;TRUE
    eo;;esperanto;;TRUE;TRUE
    es;;spanish;;TRUE;TRUE
    et;;estonian;;TRUE;TRUE
    eu;;basque;;TRUE;TRUE
    fa;;farsi;;TRUE;TRUE
    fr;;french;;TRUE;TRUE
    fur;;friulan;;TRUE;TRUE
    ga;;irish;;TRUE;TRUE
    gd;;scottish;;TRUE;TRUE
    gez;;ethiopic;;TRUE;TRUE
    gl;;galician;;TRUE;TRUE
    he;;hebrew;;TRUE;TRUE
    hi;;hindi;;TRUE;TRUE
    hr;;croatian;;TRUE;TRUE
    hu;;magyar;;TRUE;TRUE
    hy;;armenian;;TRUE;TRUE
    ia;;interlingua;;TRUE;TRUE
    id;;indonesian;;TRUE;TRUE
    is;;icelandic;;TRUE;TRUE
    it;;italian;;TRUE;TRUE
    km;;khmer;;TRUE;TRUE
    kmr;;kurmanji;;TRUE;TRUE
    kn;;kannada;;TRUE;TRUE
    ko;;korean;;TRUE;TRUE
    lo;;lao;;TRUE;TRUE
    lt;;lithuanian;;TRUE;TRUE
    lv;;latvian;;TRUE;TRUE
    ml;;malayalam;;TRUE;TRUE
    mn;;mongolian;;TRUE;TRUE
    mr;;marathi;;TRUE;TRUE
    nb;;norsk;;TRUE;TRUE
    nl;;dutch;;TRUE;TRUE
    nn;;nynorsk;;TRUE;TRUE
    no;;norsk;;TRUE;TRUE
    nqo;;nko;;TRUE;TRUE
    oc;;occitan;;TRUE;TRUE
    pa;;panjabi;;TRUE;TRUE
    pms;;piedmontese;;TRUE;TRUE
    pt;;portoguese;;TRUE;TRUE
    rm;;romanian;;TRUE;TRUE
    ro;;russian;;TRUE;TRUE
    sa;;sanskrit;;TRUE;TRUE
    se;;samin;;TRUE;TRUE
    sk;;slovak;;TRUE;TRUE
    sq;;albanian;;TRUE;TRUE
    sr;;serbian;;TRUE;TRUE
    syr;;syriac;;TRUE;TRUE
    ta;;tamil;;TRUE;TRUE
    te;;telugu;;TRUE;TRUE
    th;;thai;;TRUE;TRUE
    ti;;ethiopic;;TRUE;TRUE
    tk;;turkmen;;TRUE;TRUE
    tr;;turkish;;TRUE;TRUE
    uk;;ukrainian;;TRUE;TRUE
    ur;;urdu;;TRUE;TRUE
    vi;;vietnamese;;TRUE;TRUE