javascriptregexalgorithmtext

Regex Pattern Starting from X Pattern until X Pattern


I have been trying to figure out the regex expression but keep failing.

I need to be able to group the text file starting with the 5 digit number sequence until the next 5 digit number sequence

from the data below a group would be considered the following:

000001  10_SEC_SLATE_-_ACT_1.NEW.02      V     C        01:00:00:00 01:00:08:00 00:59:50:00 00:59:58:00 
*FROM CLIP NAME:  10 SEC SLATE - ACT 1.NEW.02 
*SOURCE FILE: 10 SEC SLATE - ACT 1.NEW.02
TITLE:   Cities_of_the_Underworld_Ep_101_Lock_Cut_210512 
FCM: NON-DROP FRAME
000001  10_SEC_SLATE_-_ACT_1.NEW.02      V     C        01:00:00:00 01:00:08:00 00:59:50:00 00:59:58:00 
*FROM CLIP NAME:  10 SEC SLATE - ACT 1.NEW.02 
*SOURCE FILE: 10 SEC SLATE - ACT 1.NEW.02
000002  KARGA7_SLATE.MOV                 V     C        01:00:00:00 01:00:10:00 00:59:50:00 01:00:00:00 
*FROM CLIP NAME:  KARGA7_SLATE.NEW.01 
*SOURCE FILE: KARGA7_SLATE.MOV
000003  KARGA7_SLATE.MOV                 A     C        01:00:00:00 01:00:10:00 00:59:50:00 01:00:00:00 
*FROM CLIP NAME:  KARGA7_SLATE.NEW.01 
*SOURCE FILE: KARGA7_SLATE.MOV
000004  KARGA7_SLATE.MOV                 A2    C        01:00:00:00 01:00:10:00 00:59:50:00 01:00:00:00 
*FROM CLIP NAME:  KARGA7_SLATE.NEW.01 
*SOURCE FILE: KARGA7_SLATE.MOV
000005  B004_C009_12071C                 V     C        10:17:25:18 10:17:26:15 01:00:00:00 01:00:00:12 
M2      B004_C009_12071C                          045.1 10:17:25:18 
*FROM CLIP NAME:  LOS1_201207_B01009.NEW.01 
*SOURCE FILE: B004_C009_12071C

Solution

  • We can try using match with the following regex pattern:

    \b\d{6}\b[\s\S]*?(?=\b\d{6}\b|$)
    

    This will match from a starting 6 digit term until hitting, but not including, the next such term or the end of the input.

    var input = `
    TITLE:   Cities_of_the_Underworld_Ep_101_Lock_Cut_210512 
    FCM: NON-DROP FRAME
    000001  10_SEC_SLATE_-_ACT_1.NEW.02      V     C        01:00:00:00 01:00:08:00 00:59:50:00 00:59:58:00 
    *FROM CLIP NAME:  10 SEC SLATE - ACT 1.NEW.02 
    *SOURCE FILE: 10 SEC SLATE - ACT 1.NEW.02
    000002  KARGA7_SLATE.MOV                 V     C        01:00:00:00 01:00:10:00 00:59:50:00 01:00:00:00 
    *FROM CLIP NAME:  KARGA7_SLATE.NEW.01 
    *SOURCE FILE: KARGA7_SLATE.MOV
    000003  KARGA7_SLATE.MOV                 A     C        01:00:00:00 01:00:10:00 00:59:50:00 01:00:00:00 
    *FROM CLIP NAME:  KARGA7_SLATE.NEW.01 
    *SOURCE FILE: KARGA7_SLATE.MOV
    000004  KARGA7_SLATE.MOV                 A2    C        01:00:00:00 01:00:10:00 00:59:50:00 01:00:00:00 
    *FROM CLIP NAME:  KARGA7_SLATE.NEW.01 
    *SOURCE FILE: KARGA7_SLATE.MOV
    000005  B004_C009_12071C                 V     C        10:17:25:18 10:17:26:15 01:00:00:00 01:00:00:12 
    M2      B004_C009_12071C                          045.1 10:17:25:18 
    *FROM CLIP NAME:  LOS1_201207_B01009.NEW.01 
    *SOURCE FILE: B004_C009_12071C
    `;
    items = input.match(/\b\d{6}\b[\s\S]*?(?=\b\d{6}\b|$)/g);
    console.log(items);

    Note that we use [\s\S]* in the regex pattern as a replacement for dot all mode, to ensure that the pattern can match across multiple lines.