I currently have a dataset that is formatted:
Company | Clients | Date | Type | Amt |
---|---|---|---|---|
comp1 | client1, client2, client3 | 01/02/22 | visa | $1500 |
comp2 | client1 | amex | $600 | |
comp3 | client3, client4, client5, client1 | 02/23/22 | check | $4000 |
comp4 | client6, client7, client8 | check | $1800 |
And I would like to end up with a dataset formatted for each client transaction:
Client | Date | Type | Amt | Company | Expense type |
---|---|---|---|---|---|
client1 | 01/02/22 | visa | $500 | comp1 | Company |
client2 | 01/02/22 | visa | $500 | comp1 | Company |
client3 | 01/02/22 | visa | $500 | comp1 | Company |
client1 | amex | $600 | comp2 | Company | |
client3 | 02/23/22 | check | $1000 | comp3 | Company |
client4 | 02/23/22 | check | $1000 | comp3 | Company |
client5 | 02/23/22 | check | $1000 | comp3 | Company |
client1 | 02/23/22 | check | $1000 | comp3 | Company |
client6 | check | $600 | comp4 | Company | |
client7 | check | $600 | comp4 | Company | |
client8 | check | $600 | comp4 | Company |
Thanks to user player0 for pointing me in the right direction with:
=ARRAYFORMULA(QUERY(SPLIT(FLATTEN(IF(IFERROR(SPLIT(B1:B, ","))="",,
SPLIT(B1:B, ", ", )&""&C1:C&""&D1:D&""&E1:E/LEN(SUBSTITUTE(FLATTEN(
QUERY(TRANSPOSE(IFERROR(1/(1/(SPLIT(B1:B, ",")<>"")))),,9^9)), " ", ))&""&A1:A)), ""),
"where Col3 is not null format Col2'mm/dd/yy', Col4'$0'", ))
This comes very close to accomplishing what I'm trying to do, but shifts cells when there are missing values so the end result looks like:
Client | Date | Type | Amt | Company | Expense type |
---|---|---|---|---|---|
client1 | 01/02/22 | visa | $500 | comp1 | |
client2 | 01/02/22 | visa | $500 | comp1 | |
client3 | 01/02/22 | visa | $500 | comp1 | |
client1 | amex | $600 | comp2 | ||
client3 | 02/23/22 | check | $1000 | comp3 | |
client4 | 02/23/22 | check | $1000 | comp3 | |
client5 | 02/23/22 | check | $1000 | comp3 | |
client1 | 02/23/22 | check | $1000 | comp3 | |
client6 | check | $600 | comp4 | ||
client7 | check | $600 | comp4 | ||
client8 | check | $600 | comp4 |
To work around this I added an if statement for each column call that checks if its empty and fills in with a space if it is empty. This gets things in the correct order, but there may be a more efficient way? The last thing to do is populate the final column with the broad category of "company", basically populate all cells in col 6 to equal the header from col 1 in the original table. This new formula looks like this:
=ARRAYFORMULA(QUERY(SPLIT(FLATTEN(IF(IFERROR(SPLIT(B2:B, ","))=""
,,SPLIT(B2:B, ",",)&""&IF(C2:C<>"",C2:C," ")&"
"&IF(!D2:D<>"",!D2:D," ")&"
"&IF(!E2:E<>"",!E2:E/LEN(SUBSTITUTE(FLATTEN(QUERY(TRANSPOSE(
IFERROR(1/(1/(SPLIT(!B2:B,",")<>"")))),,9^9))," ",))," ")&"
"&IF(!A2:A<>"",!A2:A," ")&"
"&!A1)), ""),
"Where Col5 is not null format Col2'mm/dd/yy', Col5'$0'", 0))
This generates the final field of "Company" but puts it appended to the previous column. I believe this is happening because ARRAYFORMULA
wants to be working with ranges of the same size and so interprets this as a string concatenation since it can't operate on the range? If anyone has other thoughts as to why please let me know!
Client | Date | Type | Amt | Company | Expense type |
---|---|---|---|---|---|
client1 | 01/02/22 | visa | $500 | comp1Company | |
client2 | 01/02/22 | visa | $500 | comp1Company | |
client3 | 01/02/22 | visa | $500 | comp1Company | |
client1 | amex | $600 | comp2Company | ||
client3 | 02/23/22 | check | $1000 | comp3Company | |
client4 | 02/23/22 | check | $1000 | comp3Company | |
client5 | 02/23/22 | check | $1000 | comp3Company | |
client1 | 02/23/22 | check | $1000 | comp3Company | |
client6 | check | $600 | comp4Company | ||
client7 | check | $600 | comp4Company | ||
client8 | check | $600 | comp4Company |
use:
=ARRAYFORMULA(QUERY(SPLIT(FLATTEN(IF(IFERROR(SPLIT(B1:B, ","))="",,
SPLIT(B1:B, ", ", )&""&C1:C&""&D1:D&""&E1:E/LEN(SUBSTITUTE(FLATTEN(
QUERY(TRANSPOSE(IFERROR(1/(1/(SPLIT(B1:B, ",")<>"")))),,9^9)), " ", ))&""&A1:A)), ""),
"where Col3 is not null format Col2'mm/dd/yy', Col4'$0'", ))