javascriptlocaleiso-3166ietf-bcp-47

Getting the user's region with navigator.language


For some time, I've been using something like this to get my user's country (ISO-3166):

const region = navigator.language.split('-')[1]; // 'US'

I've always assumed the string would be similar to en-US -- where the country would hold the 2nd position of the array.

I am thinking this assumption is incorrect. According to MDN docs, navigator.language returns: "string representing the language version as defined in BCP 47." Reading BCP 47, the primary language subtag is guaranteed to be first (e.g., 'en') but the region code is not guaranteed to be the 2nd subtag. There can be subtags that preceed and follow the region subtag.

For example "sr-Latn-RS" is a valid BCP 47 language tag:

sr                |  Latn           |  RS
primary language  |  script subtag  |  region subtag

Is the value returned from navigator.language a subset of BCP 47 containing only language and region? Or is there a library or regex that is commonly used to extract the region subtag from a language tag?


Solution

  • Regex found here: https://github.com/gagle/node-bcp47/blob/master/lib/index.js

    var re = /^(?:(en-GB-oed|i-ami|i-bnn|i-default|i-enochian|i-hak|i-klingon|i-lux|i-mingo|i-navajo|i-pwn|i-tao|i-tay|i-tsu|sgn-BE-FR|sgn-BE-NL|sgn-CH-DE)|(art-lojban|cel-gaulish|no-bok|no-nyn|zh-guoyu|zh-hakka|zh-min|zh-min-nan|zh-xiang))$|^((?:[a-z]{2,3}(?:(?:-[a-z]{3}){1,3})?)|[a-z]{4}|[a-z]{5,8})(?:-([a-z]{4}))?(?:-([a-z]{2}|\d{3}))?((?:-(?:[\da-z]{5,8}|\d[\da-z]{3}))*)?((?:-[\da-wy-z](?:-[\da-z]{2,8})+)*)?(-x(?:-[\da-z]{1,8})+)?$|^(x(?:-[\da-z]{1,8})+)$/i;
    
    let foo = re.exec('de-AT');      // German in Austria
    let bar = re.exec('zh-Hans-CN'); // Simplified Chinese using Simplified script in mainland China
    
    console.log(`region ${foo[5]}`); // 'region AT'
    console.log(`region ${bar[5]}`); // 'region CN'