google-sheetsgoogle-sheets-formulasparse-matrix

Database Rows to static columns


I have a single column DB file that is rigidly formatted. It doesn't have a header or field names in the export that I've imported into a Goggle Sheet. I need help with how to approach this task of reformatting the original column into multiple columns so I can use the filter tools in Google Sheets.

The following tables are in a sample shared sheet: here

The format of the single column DB is:

In column A1:A:

&++ BEGIN
[A Title A]    
Nonsense sentence    
~cat    
£££ or dog    
Nonsense sentence again    
~cat    
£££ or dog    
£££ to fish    
£££ this man    
Nonsense sentence 10    
~dog    
£££ by fish    
More nonsense sentences    
~cat again    
£££ or man    
£££ and fish    
[Title 2]    
Nonsense sentence 21    
~car    
£££ or bus    
£££ a train    
Nonsense sentence again2    
~boat and trailer    
£££ sea    
Nonsense sentence 40    
~lorry caravan fish    
£££ with a roof    
£££ the swimming pool    
£££ some rubbish    
&++ EOF (data)

The &++ BEGIN and &++ EOF... appear only once - at the start and the end

The result I'm looking for is (in four columns C, D, E, F):

Nonsense sentence         ~cat                  £££ or dog                                                  [A Title A]
Nonsense sentence again   ~cat                  £££ or dog, £££ to fish, £££ this man                       [A Title A]
Nonsense sentence 10      ~dog                  £££ by fish                                                 [A Title A]
More nonsense sentences   ~cat again            £££ or man, £££ and fish                                    [A Title A]
Nonsense sentence 21      ~car                  £££ or bus, £££ a train                                     [Title 2]
Nonsense sentence again2  ~boat and trailer     £££ sea                                                     [Title 2]
Nonsense sentence 40      ~lorry caravan fish   £££ with a roof, £££ the swimming pool, £££ some rubbish    [Title 2]

The actual order of the columns the data ends up in is not important - but there are only four columns. The option column simply needs all the options concatenated into a single string with each separated by ", "

Explaining the format required for the result is so simple and can be managed manually very easily with just the basic knowledge of Google Sheets that I possess. Trying to encapsulate it all in one arrayformula in cell C1 has defeated me - because I don't want to manually format the 20,000+ rows of the original data!

Has anybody got experience of dealing with the variable number of rows between sections using a Google Sheets formula? I can probably resort to a script but would rather not if it can be avoided (I have to explain the process to others).

I've got some way towards a result using WRAPROWS(TOROW(A:A,3),4) but this fails at the first instance where the number of rows the variable data takes up exceeds 1.


Solution

  • To do that with a Google Sheets formula, use scan() and reduce(), like this:

    =let(
      data, tocol(A1:A, 1),
      ordinals, scan(, data, lambda(a, c, ifs(
        isblank(c) + regexmatch(c, "^&"), iferror(ø),
        regexmatch(c, "^\w"), a + 1,
        true, a 
      ))),
      titles, scan(, data, lambda(a, c, ifs(
        isblank(c), iferror(ø),
        regexmatch(c, "^\["), c,
        true, a 
      ))),
      reduce(tocol(æ, 2), sequence(max(ordinals)), lambda(a, i, let(
        get_, lambda(a, regex, filter(a, ordinals = i, regexmatch(a, regex))),
        vstack(a, hstack(
          get_(data, "^\w"), get_(data, "^\~"), join(", ", get_(data, "^£££")),
          single(get_(titles, "^\["))
        ))
      )))
    )
    

    See let(), tocol(), scan(), regexmatch(), reduce(), sequence(), lambda(), filter(), vstack() and hstack().