excelpowerquerydata-cleaningdataformatdata-transform

How to Separate Data with Inconsistent Patterns into a Structured Format in Excel


Inconsistent Values in Cells

I'm working with a dataset where multiple values in a cell are tagged under categories like Location, Host, Guest, and Bucket, and separated by line breaks. I need to split these into separate columns based on the tags, but the values are inconsistent, so separating by line break creates many unwanted columns.

Below are some cells that I want to format in a specific way (see format style image attached below)

Location / Hosts / Guests / Regulars / Musicians / Band Members / Bucket Pulls
Location: Comedy Store Belly Room
Host: Tony Hinchcliffe
Host: Brian Redban
Guest: Rick Ingraham
Bucket: Skyler
Bucket: Rich Slaton
Bucket: Jeff Schwind
Bucket: Matt Devlin
Bucket: Taylor Dean
Bucket: Frank Castillo
Bucket: Kenny Lion
Bucket: Mike Stanley
Bucket: Andrew Yanker
Bucket: Sara Mostajabi
Bucket: Timothy Greer
Bucket: Kyle Henson
Location: Comedy Store Main Room
Host: Tony Hinchcliffe
Host: Brian Redban
Guest: Joe Rogan
Guest: Dom Irrera
Musical Guest: Pat Regan
Musical Guest: Jeremiah Watkins
Security: Iron Patriot
Bucket: Chris Fried
Bucket: Dan Madonia
Bucket: Gail Travers
Bucket: Chalus Robinson
Bucket: Dan Bublitz
Bucket: Mike Fairbare
Bucket: Eddie Gibson
Regular: Sara Weinshenk
Regular: Kim Congdon
Location: Comedy Store Belly Room
Host: Tony Hinchcliffe
Host: Brian Redban
Guest: Chris D’Elia
Guest: Neal Brennan
Musical Guest: Pat Regan
Artist: Ryan J. Ebelt
Bucket Signups: 30
Bucket: Matty Chymbor
Bucket: David Deery
Bucket: Franklin Yee
Bucket: Cole Young
Bucket: Gali Kroup
Bucket: Jeremy Paul
Bucket: Tim Alexander
Bucket: Mike Schmidt
Regular: Sara Weinshenk
Bucket: Eddie Gibson

Format Style Snippet

Used a Delimiter

When I use a delimiter to separate the data, it works well for the Location and Host columns because they are similar throughout the column, but the other values (like Guest and Bucket) are inconsistent across cells, so they spill into different columns after separation. This is causing issues when trying to structure the data correctly.

I've already cleaned most of the dataset, but I'm stuck on how to handle these inconsistent patterns. I have spent hours trying to figure it out, tried using different formulas like textsplit,wrapcols,wraprows, etc but to no avail. I am so frustrated please help.

Could you suggest the best way to tackle this? Any advice or suggestions would be greatly appreciated!


Solution

  • Assuming a starting table of as you have it above, you can perform the following steps:

    1. Add an Index column
    2. Split column in rows by newline
      Split by Delimiter to rows
    3. Optional (if you really need this under Bucket) Replace value Bucket Signups: with Bucket: Bucket Signups:
      Repalce value
    4. Split column again but now to Columns by : and left most: Split to columns
    5. Trim values
    6. Group by and Combine values Group By Then you will need to update the formula bar, replace:
    List.Sum([Value])
    

    with

    Text.Combine([Value], ", ")
    
    1. Pivot on Category column
      Pivot

    To give you are resulting table of:

    Index Location Host Guest Bucket Musical Guest Security Regular Artist
    1 Comedy Store Belly Room Tony Hinchcliffe, Brian Redban Rick Ingraham Skyler, Rich Slaton, Jeff Schwind, Matt Devlin, Taylor Dean, Frank Castillo, Kenny Lion, Mike Stanley, Andrew Yanker, Sara Mostajabi, Timothy Greer, Kyle Henson
    2 Comedy Store Main Room Tony Hinchcliffe, Brian Redban Joe Rogan, Dom Irrera Chris Fried, Dan Madonia, Gail Travers, Chalus Robinson, Dan Bublitz, Mike Fairbare, Eddie Gibson Pat Regan, Jeremiah Watkins Iron Patriot Sara Weinshenk, Kim Congdon
    3 Comedy Store Belly Room Tony Hinchcliffe, Brian Redban Chris D’Elia, Neal Brennan Bucket Signups: 30, Matty Chymbor, David Deery, Franklin Yee, Cole Young, Gali Kroup, Jeremy Paul, Tim Alexander, Mike Schmidt, Eddie Gibson Pat Regan Sara Weinshenk Ryan J. Ebelt

    From here, you can decide which columns to keep, rename, reorder etc...

    Full Power Query:

    let
        Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"Location / Hosts / Guests / Regulars / Musicians / Band Members / Bucket Pulls", type text}}),
        #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
        #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Added Index", {{"Location / Hosts / Guests / Regulars / Musicians / Band Members / Bucket Pulls", Splitter.SplitTextByDelimiter("#(lf)", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Location / Hosts / Guests / Regulars / Musicians / Band Members / Bucket Pulls"),
        #"Replaced Value" = Table.ReplaceValue(#"Split Column by Delimiter","Bucket Signups:","Bucket: Bucket Signups:",Replacer.ReplaceText,{"Location / Hosts / Guests / Regulars / Musicians / Band Members / Bucket Pulls"}),
        #"Split Column by Delimiter1" = Table.SplitColumn(#"Replaced Value", "Location / Hosts / Guests / Regulars / Musicians / Band Members / Bucket Pulls", Splitter.SplitTextByEachDelimiter({":"}, QuoteStyle.Csv, false), {"Category", "Value"}),
        #"Trimmed Text" = Table.TransformColumns(#"Split Column by Delimiter1",{{"Value", Text.Trim, type text}}),
        #"Grouped Rows" = Table.Group(#"Trimmed Text", {"Index", "Category"}, {{"Values", each Text.Combine([Value], ", "), type text}}),
        #"Pivoted Column" = Table.Pivot(#"Grouped Rows", List.Distinct(#"Grouped Rows"[Category]), "Category", "Values")
    in
        #"Pivoted Column"