In Power BI, I have a column that I need to split all values from Lowercase to Uppercase:
Column1
Blue
Blue
Energy
Energy
BlueÉnergie
Énergie
ConformitéBlue
Blue
ConformitéÉnergie
Énergie
The obvious solution is to use the built in function which ends up looking like this:
= Table.SplitColumn(Source, "Column1", Splitter.SplitTextByCharacterTransition({"a".."z"}, {"A".."Z"}), {"Column1.1", "Column1.2"})
The problem is that I don't get the expected result, ending up with this:
Column1.1 | Column1.2 |
---|---|
Blue | Energy |
Blue | null |
Energy | null |
Energy | Blue |
BlueÉnergie | null |
Énergie | Blue |
ConformitéBlue | null |
Blue | Conformité |
ConformitéÉnergie | null |
Énergie | Conformité |
when the expected result should be :
Column1.1 | Column1.2 |
---|---|
Blue | Energy |
Blue | null |
Energy | null |
Energy | Blue |
Blue | Énergie |
Énergie | Blue |
Conformité | Blue |
Blue | Conformité |
Conformité | Énergie |
Énergie | Conformité |
How can I get the expected result as concisely as possible such that the column will properly split from ALL lowercase characters to ALL uppercase characters?
Note: spaces and special characters do appear in my data set so I cannot split on any delimiters
@michal is on the right track but rather than list out every character, you can use the unicode blocks. See https://powerquery.how/character-fromnumber/.
When you write {"a".."z"}, under the covers, PQ is saying get me the unicode block from 66 to 91. You can leverage this by seeing other unicode blocks here: https://en.wikipedia.org/wiki/List_of_Unicode_characters and just use the start and end characters for each block you want to insert. i.e. {"Ā".."ſ"} gets you 192 to 319 which is extended latin containing cedillas,acutes,carons etc