splitpowerbipowerquerypowerbi-desktopm

Power BI M code Split column from Lowercase to Uppercase won't include diacritics (accents)


In Power BI, I have a column that I need to split all values from Lowercase to Uppercase:

Column1
Blue
Blue
Energy
Energy
BlueÉnergie
Énergie
ConformitéBlue
Blue
ConformitéÉnergie
Énergie

The obvious solution is to use the built in function which ends up looking like this:

= Table.SplitColumn(Source, "Column1", Splitter.SplitTextByCharacterTransition({"a".."z"}, {"A".."Z"}), {"Column1.1", "Column1.2"})

The problem is that I don't get the expected result, ending up with this:

Column1.1 Column1.2
Blue Energy
Blue null
Energy null
Energy Blue
BlueÉnergie null
Énergie Blue
ConformitéBlue null
Blue Conformité
ConformitéÉnergie null
Énergie Conformité

when the expected result should be :

Column1.1 Column1.2
Blue Energy
Blue null
Energy null
Energy Blue
Blue Énergie
Énergie Blue
Conformité Blue
Blue Conformité
Conformité Énergie
Énergie Conformité

How can I get the expected result as concisely as possible such that the column will properly split from ALL lowercase characters to ALL uppercase characters?

Note: spaces and special characters do appear in my data set so I cannot split on any delimiters


Solution

  • @michal is on the right track but rather than list out every character, you can use the unicode blocks. See https://powerquery.how/character-fromnumber/.

    When you write {"a".."z"}, under the covers, PQ is saying get me the unicode block from 66 to 91. You can leverage this by seeing other unicode blocks here: https://en.wikipedia.org/wiki/List_of_Unicode_characters and just use the start and end characters for each block you want to insert. i.e. {"Ā".."ſ"} gets you 192 to 319 which is extended latin containing cedillas,acutes,carons etc