localizationinternationalization

Localization of lists


What is the right way to localize a list of strings? I know that the separator can be localized to a comma or a semi-colon but does the conjunction get localized? If so, what would my format string for an arbitrary length list look like?

Example

"Bat, Cat and Dog". I could use the separator as per locale and construct the LIST as the following:

LIST := UNIT
LISTMID := UNIT SEPARATOR UNIT
LISTMID := LISTMID SEPARATOR UNIT
LIST := UNIT CONJUNCTION UNIT
LIST := LISTMID CONJUNCTION UNIT

Would I have to craft this rule per language? Any libraries available to help with this?


Solution

  • I came here looking for an answer to the same question, and ended up doing more googling, which found this: http://icu-project.org/apiref/icu4j/com/ibm/icu/text/ListFormatter.html

    The class takes Parameters two, start, middle, and end:

    So, for English, that would be:

     - TWO := "{0} and {1}"
     - START := "{0}, {1}"
     - MIDDLE := "{0}, {1}" 
     - END := "{0} and {1}"
    

    I wrote a quick Lua demonstration for how I imagine this works:

    function list_format(words, templates)
        local length = #words
        if length == 1 then return words[1] end
        if length == 2 then 
            return replace(replace(templates['two'], '{0}', words[1]), 
                '{1}', words[2])
        end
    
        local result = replace(templates['end'], '{1}', words[length])
        while length > 3 do
            length = length - 1
            local mid = replace(templates['middle'], '{1}', words[length])
            result = replace(result, '{0}', mid)
        end
        result = replace(result, '{0}', words[2])
        result = replace(templates['start'], '{1}', result)
        result = replace(result, '{0}', words[1])
        return result
    end
    
    function replace(template, index, text)
        str, _ = string.gsub(template, index, text)
        return str
    end
    
    local english = {
        ["two"] = "{0} and {1}",
        ["start"] = "{0}, {1}",
        ["middle"] = "{0}, {1}",
        ["end"] = "{0} and {1}"
    }
    
    print(list_format({"banana"}, english))
    print(list_format({"banana", "apple"}, english))
    print(list_format({"banana", "apple", "mango"}, english))
    print(list_format({"banana", "apple", "mango", "pineapple"}, english))
    

    It should be trivial to adapt this for other languages.