excelpowerquerydata-analysism

Power Query - splitting column


I hope someone can help with this one.

A column in my power query has the names of various airlines in one string. An example is:

American AirlinesBritishAirwaysFinnairAlphaSky

I need to split this column into the separate airlines, but given there are no separator or special characters between the airlines, the only way I can do this is to split the column using the 'By lowercase to uppercase' delimiter.

This works fine for most airlines, but in the example above, power query will also split AlphaSky into separate columns (Alpha and Sky), but this airline name is actually AlphaSky (with the capital S).

I thought about using a reference table and comparing against the column, but this doesn't work as the column can be any combination of airlines, so a match for each airline is impossible as far as I can see.

At the moment I have a step to split the column as above and then to merge the split columns into one using | as the delimiter. Then I have a number of 'find and replace' steps to remove the incorrect delimiters. This is working, but the number of 'find and replace' steps is growing due to the number of official airline names that have a capital letter mid name. The issue also occurs when the airline name is all capitals (eg KLM, ANA etc) where no | is inserted due to the next character also being in upper case.

Can anyone advise if there is a more elegant solution, or whether the above solution is the best given the data format I have to work with?


Solution

  • From your data, I am assuming the only differences we need to account for are

    Start with these two tables, named as you see in the screenshots:
    enter image description here
    enter image description here

    For each table: Data => Get&Transform => from Table

    Modify the Airlines table code to read:

    let
        Source = Excel.CurrentWorkbook(){[Name="Airlines"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"Alist", type text}}),
        #"Remove Spaces" = Table.AddColumn(#"Changed Type", "NoSpace", each Text.Replace([Alist]," ",""))
    in
        #"Remove Spaces"
    

    That will add a second column with the spaces removed from the proper name.

    For the main code segment:

    let
        Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"Data", type text}}) ,
        
        #"Split Airlines" = Table.AddColumn(#"Changed Type", "Split", (r)=> 
            [a=Airlines[Alist],
             b=Airlines[NoSpace],
             c=List.Transform(a, each Text.Contains(r[Data], _)),
             d=List.Transform(b, each Text.Contains(r[Data], _)),
             e=List.Zip({c,d}),
             f=List.Transform(e, each _{0} or _{1}),
             g=List.PositionOf(f,true,Occurrence.All),
             h=List.Accumulate(
                 g,
                 {},
                 (s,cur)=>s & {a{cur}}),
             i=Record.FromList(h, List.Transform(h, 
                    each "Airline." & Text.From(List.PositionOf(h,_)+1)))][i]),
    
        #"All Column Names" = List.Distinct(List.Combine(List.Transform(#"Split Airlines"[Split], each Record.FieldNames(_)))),
        #"To Table" = Table.ExpandRecordColumn(#"Split Airlines","Split", #"All Column Names"),
        #"Removed Columns" = Table.RemoveColumns(#"To Table",{"Data"}),
        #"Set Data Types" = Table.TransformColumnTypes(#"Removed Columns", 
                                List.Transform(Table.ColumnNames(#"Removed Columns"), each {_, type text}))
    in
        #"Set Data Types"
    

    The Split Airlines step could be shortened, but I laid it out step by step to make it easier to follow.
    The code should adjust for any number of airlines

    Results
    enter image description here