I'm working with a dataset where multiple values in a cell are tagged under categories like Location, Host, Guest, and Bucket, and separated by line breaks. I need to split these into separate columns based on the tags, but the values are inconsistent, so separating by line break creates many unwanted columns.
Below are some cells that I want to format in a specific way (see format style image attached below)
Location / Hosts / Guests / Regulars / Musicians / Band Members / Bucket Pulls |
---|
Location: Comedy Store Belly Room Host: Tony Hinchcliffe Host: Brian Redban Guest: Rick Ingraham Bucket: Skyler Bucket: Rich Slaton Bucket: Jeff Schwind Bucket: Matt Devlin Bucket: Taylor Dean Bucket: Frank Castillo Bucket: Kenny Lion Bucket: Mike Stanley Bucket: Andrew Yanker Bucket: Sara Mostajabi Bucket: Timothy Greer Bucket: Kyle Henson |
Location: Comedy Store Main Room Host: Tony Hinchcliffe Host: Brian Redban Guest: Joe Rogan Guest: Dom Irrera Musical Guest: Pat Regan Musical Guest: Jeremiah Watkins Security: Iron Patriot Bucket: Chris Fried Bucket: Dan Madonia Bucket: Gail Travers Bucket: Chalus Robinson Bucket: Dan Bublitz Bucket: Mike Fairbare Bucket: Eddie Gibson Regular: Sara Weinshenk Regular: Kim Congdon |
Location: Comedy Store Belly Room Host: Tony Hinchcliffe Host: Brian Redban Guest: Chris D’Elia Guest: Neal Brennan Musical Guest: Pat Regan Artist: Ryan J. Ebelt Bucket Signups: 30 Bucket: Matty Chymbor Bucket: David Deery Bucket: Franklin Yee Bucket: Cole Young Bucket: Gali Kroup Bucket: Jeremy Paul Bucket: Tim Alexander Bucket: Mike Schmidt Regular: Sara Weinshenk Bucket: Eddie Gibson |
When I use a delimiter to separate the data, it works well for the Location and Host columns because they are similar throughout the column, but the other values (like Guest and Bucket) are inconsistent across cells, so they spill into different columns after separation. This is causing issues when trying to structure the data correctly.
I've already cleaned most of the dataset, but I'm stuck on how to handle these inconsistent patterns. I have spent hours trying to figure it out, tried using different formulas like textsplit,wrapcols,wraprows, etc but to no avail. I am so frustrated please help.
Could you suggest the best way to tackle this? Any advice or suggestions would be greatly appreciated!
Assuming a starting table of as you have it above, you can perform the following steps:
Bucket
) Replace value Bucket Signups:
with Bucket: Bucket Signups:
:
and left most:
List.Sum([Value])
with
Text.Combine([Value], ", ")
To give you are resulting table of:
Index | Location | Host | Guest | Bucket | Musical Guest | Security | Regular | Artist |
---|---|---|---|---|---|---|---|---|
1 | Comedy Store Belly Room | Tony Hinchcliffe, Brian Redban | Rick Ingraham | Skyler, Rich Slaton, Jeff Schwind, Matt Devlin, Taylor Dean, Frank Castillo, Kenny Lion, Mike Stanley, Andrew Yanker, Sara Mostajabi, Timothy Greer, Kyle Henson | ||||
2 | Comedy Store Main Room | Tony Hinchcliffe, Brian Redban | Joe Rogan, Dom Irrera | Chris Fried, Dan Madonia, Gail Travers, Chalus Robinson, Dan Bublitz, Mike Fairbare, Eddie Gibson | Pat Regan, Jeremiah Watkins | Iron Patriot | Sara Weinshenk, Kim Congdon | |
3 | Comedy Store Belly Room | Tony Hinchcliffe, Brian Redban | Chris D’Elia, Neal Brennan | Bucket Signups: 30, Matty Chymbor, David Deery, Franklin Yee, Cole Young, Gali Kroup, Jeremy Paul, Tim Alexander, Mike Schmidt, Eddie Gibson | Pat Regan | Sara Weinshenk | Ryan J. Ebelt |
From here, you can decide which columns to keep, rename, reorder etc...
Full Power Query:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Location / Hosts / Guests / Regulars / Musicians / Band Members / Bucket Pulls", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Added Index", {{"Location / Hosts / Guests / Regulars / Musicians / Band Members / Bucket Pulls", Splitter.SplitTextByDelimiter("#(lf)", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Location / Hosts / Guests / Regulars / Musicians / Band Members / Bucket Pulls"),
#"Replaced Value" = Table.ReplaceValue(#"Split Column by Delimiter","Bucket Signups:","Bucket: Bucket Signups:",Replacer.ReplaceText,{"Location / Hosts / Guests / Regulars / Musicians / Band Members / Bucket Pulls"}),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Replaced Value", "Location / Hosts / Guests / Regulars / Musicians / Band Members / Bucket Pulls", Splitter.SplitTextByEachDelimiter({":"}, QuoteStyle.Csv, false), {"Category", "Value"}),
#"Trimmed Text" = Table.TransformColumns(#"Split Column by Delimiter1",{{"Value", Text.Trim, type text}}),
#"Grouped Rows" = Table.Group(#"Trimmed Text", {"Index", "Category"}, {{"Values", each Text.Combine([Value], ", "), type text}}),
#"Pivoted Column" = Table.Pivot(#"Grouped Rows", List.Distinct(#"Grouped Rows"[Category]), "Category", "Values")
in
#"Pivoted Column"