javascriptdiacritics

Combining-diacritic-sensitive char replacement for a string


For a bug in a program where I am replacing chars in a string:

let a = "o";
let b = "x";
let preString = "őa";

let postString = preString.replace(a, b);

alert(postString);

The expected behaviour should print out "őa", becuse the code is set to replace "o" with "x" in the string "őa", but there is no "o" in the string to replace, so, it should stay the same.

Instead it prints out "x̋a". because, "ő" is two characters o◌̈. and so the replace function matches to the "o" leaving the combining diacritic to later display on the x.

How would I get my expected behaviour? I don't believe normalizing the strings would solve anything. The only other solution I can think of would be to split the string in a way that I get an array [o◌̈, a] and can iterate over that. Unless there is a RegExp function I'm missing?


Solution

  • You can use a regular expression matching the base character followed by combining marks. In the regex /o[\u0300-\u036f]/g, [\u0300-\u036f] is a range that includes all combining diacritical marks, and the g flag ensures that all occurrences are replaced. This way, the "o" and its diacritic are treated as a single unit for replacement.

    let a = "o";
    let b = "x";
    let preString = "őa";
    
    // Use a regular expression to match the character and any combining marks
    let combiningMarksRegex = /o[\u0300-\u036f]/g;
    let postString = preString.replace(combiningMarksRegex, b);
    alert(postString);