javascriptregex

Regex to compare strings with Umlaut and non-Umlaut variations


Can anyone help me with a javascript regular expression that I can use to compare strings that are the same, taking into acccount their non-Umlaut-ed versions.

for example, in German the word Grüße can also be written Gruesse. These two strings are to be considered identical. The mappings (ignoring casings for the moment) are:

As there are not many "couplets" to consider I could do a replace for each variation, but I'm wondering if there is a more elegant way, especially as this use case might need to be extended in future to include e.g. Scandanavian characters...


Solution

  • something like

    tr = {"ä":"ae", "ü":"ue", "ö":"oe", "ß":"ss" }
    
    replaceUmlauts = function(s) {
        return s.replace(/[äöüß]/g, function($0) { return tr[$0] })
    }
    
    compare = function(a, b) {
        return replaceUmlauts(a) == replaceUmlauts(b)
    }
    
    alert(compare("grüße", "gruesse"))
    

    you can easily extends this by adding more entries to "tr"

    not quite elegant, but works