google-sheetsgoogle-sheets-formula

IMPORTRANGE with REGEXREPLACE/REGEXMATCH function


I have dummy data related to product code of books:

XXX AAA BB Document Code Test Type
P01 BOO 01 BK-001-CR 1. Content Quality Testing
   - Proofreading
   - Fact-checking
   - Final editing
2. Design and Layout Testing
P01 BOO 02 BK-002-CR 1. Physical Production Testing
2. Distribution Testing
P02 OOK 01 BK-001-CR 1. Content Quality Testing
   - Proofreading
   - Fact-checking
   - Final editing
2. Design and Layout Testing
P03 KOO 01 BK-001-RT 1. Content Quality Testing
   - Proofreading
   - Fact-checking
   - Final editing
2. Design and Layout Testing

as listed in this link.

I want to clean it up into expected output as follows:

Product Code Document Code Test Type
P01.BOO.01
P02.OOK.01
BK-001-CR Content Quality Testing
   - Proofreading
   - Fact-checking
   - Final editing
P01.BOO.01
P02.OOK.01
BK-001-CR Design and Layout Testing
P01.BOO.02 BK-002-CR Physical Production Testing
P01.BOO.02 BK-002-CR Distribution Testing
P03.KOO.01 BK-001-RT Content Quality Testing
   - Proofreading
   - Fact-checking
   - Final editing
P03.KOO.01 BK-001-RT Design and Layout Testing

as listed in this link.

For the output, I have tried with formula in Product Code column:

=ARRAYFORMULA( 
  TEXTJOIN(" "; TRUE; 
    FILTER(
      IMPORTRANGE("https://docs.google.com/spreadsheets/d/1dIVsBszYwQnELHwu8C7BaamZnQTz2zogzKlrkq72IZQ/edit?gid=0#gid=0"; "Test!B2:B5") & "." & 
      IMPORTRANGE("https://docs.google.com/spreadsheets/d/1dIVsBszYwQnELHwu8C7BaamZnQTz2zogzKlrkq72IZQ/edit?gid=0#gid=0"; "Test!C2:C5") & "." &
      IMPORTRANGE("https://docs.google.com/spreadsheets/d/1dIVsBszYwQnELHwu8C7BaamZnQTz2zogzKlrkq72IZQ/edit?gid=0#gid=0"; "Test!D2:D5"); 
      (REGEXMATCH(IMPORTRANGE("https://docs.google.com/spreadsheets/d/1dIVsBszYwQnELHwu8C7BaamZnQTz2zogzKlrkq72IZQ/edit?gid=0#gid=0"; "Test!F2:F5"); I2) + 
      REGEXMATCH(IMPORTRANGE("https://docs.google.com/spreadsheets/d/1dIVsBszYwQnELHwu8C7BaamZnQTz2zogzKlrkq72IZQ/edit?gid=0#gid=0"; "Test!B2:B5"); H2)) > 0
    )
  )
)

In Document Code column:

=QUERY(IMPORTRANGE("https://docs.google.com/spreadsheets/d/1dIVsBszYwQnELHwu8C7BaamZnQTz2zogzKlrkq72IZQ/edit?gid=0#gid=0"; "Test!E2:F5"); "SELECT Col1 WHERE Col2 CONTAINS '" & I2 & "'")

In Test Type column:

=ARRAYFORMULA(
  UNIQUE(
    FILTER(
      TRIM(
        FLATTEN(
          SPLIT(
            REGEXREPLACE(
              IMPORTRANGE("https://docs.google.com/spreadsheets/d/1dIVsBszYwQnELHwu8C7BaamZnQTz2zogzKlrkq72IZQ/edit?gid=0#gid=0"; "Test!F2:F5"); 
              "\d+\."; 
              "~"
            ); 
            "~"
          )
        )
      ); 
      (TRIM(FLATTEN(SPLIT(REGEXREPLACE(IMPORTRANGE("https://docs.google.com/spreadsheets/d/1dIVsBszYwQnELHwu8C7BaamZnQTz2zogzKlrkq72IZQ/edit?gid=0#gid=0"; "Test!F2:F5"); "\d+\."; "~"); "~"))) <> "")
    )
  )
)

However, there is still an error and it still does not match the expected output. enter image description here

Is there any other formula?


Solution

  • Here's a generalized approach which you may adapt accordingly:

    =let(Λ;importrange("1dIVsBszYwQnELHwu8C7BaamZnQTz2zogzKlrkq72IZQ";"B2:F7");Σ;lambda(x;choosecols(Λ;x));a;Σ(1);b;Σ(2);c;Σ(3);d;Σ(4);e;Σ(5);
     reduce(tocol(;1);unique(tocol(d;1));lambda(x;y;vstack(x;reduce(tocol(;1);sequence(max(index(ifna(--regexextract(split(xlookup(y;d;e);char(10));"^\d+")))));lambda(f;q;vstack(f;
     hstack(join(char(10);filter(a&"."&b&"."&c;d=y));y;regexextract(xlookup(y;d;e);"(?s).*"&q&". (.*?)(?:\n"&q+1&". |$)")))))))))
    

    enter image description here