regexgoogle-sheetsgoogle-sheets-formulaarray-formulasgoogle-sheets-query

Remove diacritics via formula on Google Sheets (Part 2)


With the help of a StackOverflow member, we have created a way to remove diacritics from personal names, these diacritics are saved in a list and as they appear in their names, they are removed and converted to common keyboard letters.

=ARRAYFORMULA(PROPER(IFERROR(REGEXREPLACE(C2:C, TEXTJOIN("|", 1, A2:A1000),
 VLOOKUP(REGEXEXTRACT(C2:C, TEXTJOIN("|", 1, A2:A1000)), A2:B1000, 2*
 SIGN(ROW(A2:A1000)), 0)&""), C2:C)))

But ... What happened was that in one of the names instead of the formula converting the letter "í" to "i", it converted to "a"

I would like help finding out what happened as it is the first time this has occurred and I could not find the problem.

The name was:

Á. Di María

And converted to:

A. Di Maraa

The correct form would be:

A. Di Maria

Both diacritics are saved in the conversion list, so it's not the fault, there's something I'm missing but I don't know what it is.

I leave here the link to the spreadsheet, so you can make it easier to understand what happened:

https://docs.google.com/spreadsheets/d/1yfB8GskVU_ciFKuzae9XQF-pi3y6jsYtsanN46vmNOs/edit?usp=sharing


Solution

  • =ARRAYFORMULA(PROPER(SUBSTITUTE(SUBSTITUTE(TRANSPOSE(QUERY(TRANSPOSE(
     IFERROR(REGEXREPLACE(REGEXEXTRACT(LOWER(
     SUBSTITUTE(INDIRECT("C2:C"&COUNTA(C2:C)+1), " ", "♥")), REPT("(.)", 
            LEN(INDIRECT("C2:C"&COUNTA(C2:C)+1)))), VLOOKUP(REGEXEXTRACT(LOWER(
     SUBSTITUTE(INDIRECT("C2:C"&COUNTA(C2:C)+1), " ", "♥")), REPT("(.)", 
            LEN(INDIRECT("C2:C"&COUNTA(C2:C)+1)))), LOWER(A:B), 1, 0), VLOOKUP(REGEXEXTRACT(
     SUBSTITUTE(INDIRECT("C2:C"&COUNTA(C2:C)+1), " ", "♥"), REPT("(.)", 
            LEN(INDIRECT("C2:C"&COUNTA(C2:C)+1)))), LOWER(A:B), 2, 0)), REGEXEXTRACT(
     SUBSTITUTE(INDIRECT("C2:C"&COUNTA(C2:C)+1), " ", "♥"), REPT("(.)", 
            LEN(INDIRECT("C2:C"&COUNTA(C2:C)+1)))))),,999^99)), " ", ), "♥", " ")))
    

    0