I have a single column that contains mixed data (strings, numbers, sometimes blanks) and from time to time, there are marker rows that delimit blocks of data. What I would like to do is:
Example (simplified):
| Column |
|---|
| data1 |
| data2 |
| Specific competencies. <-- start collecting (from next row) |
| data3 |
| data4 |
| Basic knowledge of Mathematics course. <-- stop collecting (not included) |
| data5 |
| Specific competencies. <-- start collecting (from next row) |
| data6 |
| data7 |
| data8 |
| data9 |
| data10 |
| Basic knowledge of Science courses. <-- stop collecting (not included) |
| data11 |
| ... |
Desired output (each block in its own column):
| Block 1 | Block 2 | ... |
|---|---|---|
| data3 | data6 | ... |
| data4 | data7 | |
| data8 | ||
| data9 | ||
| data10 |
For a single block, I can do:
=LET(range,B:B, FILTER(range,ISBETWEEN( ROW(range), MATCH("Specific competencies.",range,0), MATCH(TRUE, REGEXMATCH(range, "^Basic knowledge of.* courses?\.$"), 0))))
But that only extracts the first block, not multiple ones.
I have also tried using REDUCE + LAMBDA + HSTACK to iterate row by row and build the results, but I run into errors.
Any help is appreciated. Thank you!
Assuming that you're not wanting to output the header row, you should be able to use the following formula.
=index(map(split(join("♠", if(regexmatch(B:B, "^Specific competencies|Basic knowledge of"), "♦", B:B)), "♦"), lambda(c, tocol(split(c, "♠")))))
You can use any two characters for the separators, but try to use uncommon characters. The formula first replaces "Specific competencies..." and "Basic knowledge of..." with the spade character "♠" and then joins the entire column together using the diamond character "♦". Next, it splits on the spade, so that each group is in its own column. Finally, it MAPs over those values and uses TOCOL/SPLIT to flatten the group values into an array down each column.
Edit: The formula returns an error if there are numerical values due to the REGEXMATCH. This can be easily overcome by attaching an empty string onto all the values in column B with REGEXMATCH(""&B:B; however, the next roadblock occurs because there are around 22,000 rows and joining together that many rows exceeds the 50,000 character limit. In addition, the formula wasn't returning just the values between the start and ending cells.
Therefore, I'm changing the formula to the following:
=bycol(wrapcols(filter(sequence(rows(A:A)),regexmatch(""&A:A,"^Specific competencies|Basic knowledge of")),2),lambda(c,chooserows(A:A,sequence(index(c,2)-index(c,1)-1,1,index(c,1)+1))))
This formula first FILTERs the SEQUENCE(ROWS(A:A)) for all the starting and ending cells. It then uses WRAPCOLS to make vertical pairs of start/end indices. Then, if uses BYCOL to iterate across the pairs, using them to generate the SEQUENCE of row numbers to pull from column A using CHOOSEROWS.