powerbitreepowerquerym

Conditionally duplicate records and modify one column using PowerQuery/M in Power BI


I'm using PowerQuery/M (well ,trying, this is my first time having to go beyond the basics) in Power BI. I have a json file representing a tree like structure, which I am trying to flatten out.

The business rules require that there shouldn't be a branch (categories) and a leaf (products) node in the same record, but due to human error, it happens sometimes (and I have no authority to make anyone follow the rules).

enter image description here

What I need to do is identify rows that have lists with items (non empty lists) for the leaf AND branch column, then replicate the row once for the leaf and once for the branch (I guess replicate once so I end up with 2 rows). This sounds a bit messy, so I've attached a screenshot.

Row 1 has a list with 1 item in the Leaf Node and a list with 2 items in the Branch Node (Row 2 is correct, having an empty list in the Leaf Node and a list with 2 items in the Branch Node). I need row 1 to be split into to rows, one with the Leaf Node list and getting rid of the Branch Node list, and the other keeping the Branch Node list and getting rid of the Leaf Node list (getting rid of could mean replacing with null or an empty list).

What I'm trying is to count the number of items in each list, row by row, if Leaf Node and Branch Node have > 0 list items, then do the copying and either replace the other list (in the leaf row, replace the branch list, and vice versa) with an empty list (or null, probably doesn't really matter). Rows that were already OK should be kept without any modification. Then I can keep expanding the columns and my life will be perfect.


Solution

  • you can try this in powerquery

    let Source = #table({"Leaf Node", "Branch Node"},{{{"apple"},{"banana","cherry"}},{{},{"date","elderberry"}},{{"fig","gooseberry"},{}}}),
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
    #"Added Custom" = Table.AddColumn(#"Added Index", "check", each List.Count([Leaf Node])>0 and List.Count([Branch Node])>0),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom", each [check] = true),
    Part1 = Table.AddColumn(Table.RemoveColumns(#"Filtered Rows",{"Branch Node"}), "Branch Node", each {}),
    Part2 = Table.AddColumn(Table.RemoveColumns(#"Filtered Rows",{"Leaf Node"}), "Leaf Node", each {}),
    combined =  Table.SelectRows(#"Added Custom", each [check] = false) & Part1 & Part2,
    #"Sorted Rows" = Table.Sort(combined,{{"Index", Order.Ascending}}),
    #"Removed Columns" = Table.RemoveColumns(#"Sorted Rows",{"Index", "check"})
    in #"Removed Columns"
    

    It (a) adds an index (b) uses List count to find row with a count in both leaf a node columns (c) filers step b then removes leaf (d) filters step b then removes node (e) combines steps c+d+inverse of step b (f) resorts (g) removes extra columns

    enter image description here