sqlregexoracle-databaseregexp-like

Table names classifications starting with and ending with in Oracle DB


Let me explain with simple examples , basically we have some complex design of table naming conventions at least in stage schema there are 4 patterns.

Total LAYERS 4

  1. STAGE LAYER
  2. BASIC DATA STORE LAYER
  3. EDW LAYER
  4. MART LAYER

All the below pattern specifically fall in STAGE Layer only

STAGE LAYER - 'DS_00','DSC_00','DSP_00','DT'

In this schema when looking for search pattern combination i need to consider the tables beginning with 'DS_00','DSC_00','DSP_00','DT' (the below explained patterns) and Ending with (T,W,W01,W02,W03,W1,W2,W3),so the table must be classified as STANADARD_NAMING_CONV else NON_STANDARD_NAMING_CONV

T - Technical W - Working Tables

Pattern 1

DS_00_to_99 (00_to_99 - is the number to identify which system the table is being loaded from source) DS_SYSTEM_NO_TABLENAME_ENDING SYSTEM_NO - 00_to_99 TABLENAME - full table name ENDING - not all cases the table is clssified as (T,W,W01,W02,W03,W1,W2,W3) so the numbers range as mentioned 00 to 99 is also same for other patterns 1 as well.

Pattern 2

DSC_00_to_99 (00_to_99 - is the number to identify which system the table is being loaded from source) DSC_SYSTEM_NO_TABLENAME_ENDING so the numbers range as mentioned 00 to 99 is also same for other patterns 2 as well.

Pattern 3

DSP_00_to_99 (00_to_99 - is the number to identify which system the table is being loaded from source) DS_SYSTEM_NO_TABLENAME_ENDING so the numbers range as mentioned 00 to 99 is also same for other patterns 3 as well.

Pattern 4

DT_ABC_FHSJ DS_TABLENAME_ENDING so the numbers range is not applicable for this pattern

BASIC DATA STORE LAYER - 'DS'

In this schema when looking for search pattern combination i need to consider the tables beginning with 'DB' and Ending with (D,F,L,T,W,W01,W02,W03,W1,W2,W3) ,so the table must be classified as STANADARD_NAMING_CONV else NON_STANDARD_NAMING_CONV

EDW LAYER - 'DE'

In this schema when looking for search pattern combination i need to consider the tables beginning with 'DB' and Ending with (D,F,L,T,W,W01,W02,W03,W1,W2,W3) ,so the table must be classified as STANADARD_NAMING_CONV else NON_STANDARD_NAMING_CONV and in this layer there are some other tables which begin with TBD_ must be classified as TO_BE_DROPPED and TMP_ must be classified as TEMPORARY_TABLE

MART LAYER - 'DM'

In this schema when looking for search pattern combination i need to consider the tables beginning with 'DB' and Ending with (D,F,L,T,W,W01,W02,W03,W1,W2,W3,A,AD,AM,AQ,AY) ,so the table must be classified as STANADARD_NAMING_CONV else NON_STANDARD_NAMING_CONV and in this layer there are some other tables which begin with TBD_ must be classified as TO_BE_DROPPED and TMP_ must be classified as TEMPORARY_TABLE

D - Dimension tables
F - Fact tables
L - Lookup tables
T - Technical tables
W - Work tables
A - Aggregate Fact Tables
AD - Aggregate Fact Daily
AM - Aggregate Fact Monthly
AQ - Aggregate Fact Quarterly
AY - Aggregate Fact Yearly

Query 1 - OLD Query

SELECT owner,
         object_name,
         beginning,
         ending,
         --count(*),
         CASE
         WHEN ( beginning, ending ) IN (
                ( 'DS', 'T' ),
                ( 'DS', 'W' ),
                ( 'DS', 'W01' ),
                ( 'DS', 'W02' ),
                ( 'DS', 'W03' ),
                ( 'DS', 'W1' ),
                ( 'DS', 'W2' ),
                ( 'DS', 'W3' ),
                ( 'DB', 'D' ),
                ( 'DB', 'F' ),
                ( 'DB', 'L' ),
                ( 'DB', 'T' ),
                ( 'DB', 'W' ),
                ( 'DB', 'W01' ),
                ( 'DB', 'W02' ),
                ( 'DB', 'W03' ),
                ( 'DB', 'W1' ),
                ( 'DB', 'W2' ),
                ( 'DB', 'W3' ),
                ( 'DE', 'D' ),
                ( 'DE', 'F' ),
                ( 'DE', 'L' ),
                ( 'DE', 'T' ),
                ( 'DE', 'W' ),
                ( 'DE', 'W01' ),
                ( 'DE', 'W02' ),
                ( 'DE', 'W03' ),
                ( 'DE', 'W1' ),
                ( 'DE', 'W2' ),
                ( 'DE', 'W3' ),
                ( 'DA', 'D' ),
                ( 'DA', 'F' ),
                ( 'DA', 'L' ),
                ( 'DA', 'T' ),
                ( 'DA', 'W' ),
                ( 'DA', 'W01' ),
                ( 'DA', 'W02' ),
                ( 'DA', 'W03' ),
                ( 'DA', 'W1' ),
                ( 'DA', 'W2' ),
                ( 'DA', 'W3' ),
                ( 'DA', 'A' ),
                ( 'DA', 'AD' ),
                ( 'DA', 'AM' ),
                ( 'DA', 'AQ' ),
                ( 'DA', 'AY' )
              )
         THEN 'STANDARD_NAMING_CONVENTION'
         WHEN object_name LIKE 'TBD%'
         THEN 'TO_BE_DROPPED'
         WHEN object_name LIKE 'TMP%'
         THEN 'TEMPORARY_TABLE'
         ELSE 'NON_STANDARD_NAMING_CONVENTION'
         END AS table_classification
  FROM   (
    SELECT owner,       
           object_name,
           CASE first_separator
           WHEN 0
           THEN NULL
           ELSE SUBSTR( object_name, 1, first_separator - 1 )
           END AS beginning,
           CASE last_separator
           WHEN 0
           THEN NULL
           ELSE SUBSTR( object_name, last_separator + 1 )
           END AS ending
    FROM   (
      SELECT owner,       
             object_name,
             INSTR( object_name, '_', 1 )  AS first_separator,
             INSTR( object_name, '_', -1 ) AS last_separator
      FROM   dba_objects
    WHERE  owner in ('DI_STAGE','DI_BDS','DI_EDW','DI_MART')   
      and    object_type='TABLE'
      group by owner,object_name
      order by owner desc 
    )
  );

Query 2 - Final Query - Worked

SELECT OWNER, 
       OBJECT_NAME, 
       REGEXP_REPLACE(OBJECT_NAME, '^([A-Z0-9$]{1,})_([A-Z0-9$]{1,})_.*','\1_\2')AS  BEGINNING, 
       REGEXP_SUBSTR(OBJECT_NAME, '[^_]*$') AS ENDING, 
      CASE 
        WHEN REGEXP_LIKE(OBJECT_NAME, '^D(S[CP]?|T)_.+_(T|(W0?[123]?))$')
       THEN 
         'STANDARD_NAMING_CONVENTION' 
         WHEN REGEXP_LIKE(OBJECT_NAME, '^DB_.+_(D|F|T|W|W1|W2|W3)$')
           THEN 
         'STANDARD_NAMING_CONVENTION' 
         WHEN REGEXP_LIKE(OBJECT_NAME, '^DE_.+_(D|F|L|T|W|W01|W02|W03|W1|W2|W3)$')
      THEN 
         'STANDARD_NAMING_CONVENTION'
         WHEN OBJECT_NAME LIKE 'TBD%' 
        THEN 
            'TO_BE_DROPPED' 
         WHEN OBJECT_NAME LIKE 'TMP%' 
        THEN 'TEMPORARY_TABLE' 
         WHEN REGEXP_LIKE(OBJECT_NAME, '^DM_.+_(D|F|L|T|W|W01|W02|W03|W1|W2|W3|A|AD|AM|AQ|AY)$') 
       THEN 
         'STANDARD_NAMING_CONVENTION' 
         ELSE 'NON_STANDARD_NAMING_CONVENTION' 
       END 
       AS TABLE_CLASSIFICATION 
FROM   DBA_OBJECTS 
WHERE  OWNER IN ('DI_STAGE', 'DI_BDS', 'DI_EDW', 'DI_MART') 
       AND OBJECT_TYPE = 'TABLE' 
GROUP  BY OWNER, 
          OBJECT_NAME 
ORDER  BY OWNER DESC, 
          OBJECT_NAME; 

EXPECTED RESULTS - Matched

SNO OWNER  OBJECT_NAME       BEGINNING ENDING  TABLE_CLASSIFICATION 
01  DI_BDS DB_PROD_DGGAA_D   DB_PROD   D       STANDARD_NAMING_CONVENTION
02  DI_BDS DB_CUST_DHHA_F    DB_CUST   F       STANDARD_NAMING_CONVENTION
03  DI_BDS DB_DHSHJA_HHSGS_T DB_DHSHJA T       STANDARD_NAMING_CONVENTION
04  DI_BDS DB_DHS_DHHA_W     DB_DHS    W       STANDARD_NAMING_CONVENTION
05  DI_BDS DB_GSG_DHHA_W01   DB_GSG    W01     STANDARD_NAMING_CONVENTION
06  DI_BDS DB_GFS_FHSH_W02   DB_GFS    W02     STANDARD_NAMING_CONVENTION
07  DI_BDS DB_FGS_FHS_W03    DB_FGS    W03     STANDARD_NAMING_CONVENTION
08  DI_BDS DB_DJJ_GSA_W1     DB_DJJ    W1      STANDARD_NAMING_CONVENTION
09  DI_BDS DB_DKS_SJ_W2      DB_DKS    W2      STANDARD_NAMING_CONVENTION
10  DI_BDS DB_DJA_DT_W3      DB_DJA    W3      STANDARD_NAMING_CONVENTION
11  DI_BDS DB_DHH_DG         DB_DHH    DG      NON_STANDARD_NAMING_CONV
12  DI_BDS DB_DNS_DRS_123    DB_DNS    123     NON_STANDARD_NAMING_CONV
13  DI_BDS DB_FHD_DRS_1      DB_FHD    1       NON_STANDARD_NAMING_CONV
14  DI_BDS DB_OKS_DRS_0      DB_OKS    0       NON_STANDARD_NAMING_CONV
15  DI_BDS DB_SKG_DRS_90     DB_SKG    90      NON_STANDARD_NAMING_CONV

Finally worked


Solution

  • It seems like this is a pattern matching issue, but the patterns are also conditional on the table owner. I make it something like this:

    with test (owner, table_name) as
         ( select 'DWH_STAGE_LAYER',    'DS_WHATEVER' from dual union all
           select 'DWH_STAGE_LAYER',    'DS_ANYTHING_F' from dual union all
           select 'DWH_DATAMART_LAYER', 'DM_CONS_REV_F' from dual union all
           select 'DWH_DATAMART_LAYER', 'DM_CONS_REV_F34' from dual union all
           select 'DWH_DATAMART_LAYER', 'DM_ORDER_ENTRY_W' from dual union all
           select 'DWH_DATAMART_LAYER', 'DM_ORDER_ENTRY_W12' from dual union all
           select 'DWH_DATAMART_LAYER', 'DM_PROC_SALE_A' from dual union all
           select 'DWH_DATAMART_LAYER', 'DM_PROC_SALE_A12' from dual union all
           select 'DWH_DATAMART_LAYER', 'DM_PROC_SALE_AD' from dual union all
           select 'DWH_DATAMART_LAYER', 'DM_PROC_SALE_AD11' from dual union all
           select 'DWH_DATAMART_LAYER', 'DM_PROC_SALE_AM' from dual union all
           select 'DWH_DATAMART_LAYER', 'DM_PROC_SALE_AQ' from dual union all
           select 'DWH_DATAMART_LAYER', 'DM_PROC_SALE_AQ234' from dual union all
           select 'DWH_DATAMART_LAYER', 'DM_WHATEVER_AQ' from dual union all
           select 'DWH_EDW_LAYER',      'DM_WHATEVER_AQ' from dual union all
           select 'DWH_EDW_LAYER',      'DE_EMP_POOL_T' from dual union all
           select 'DWH_EDW_LAYER',      'DE_EMP_POOL_T33' from dual union all
           select 'DWH_EDW_LAYER',      'DE_PROD_RECORD_D' from dual union all
           select 'DWH_EDW_LAYER',      'DE_PROD_RECORD_D123' from dual
         )
    select t.owner, t.table_name
         , case
               when t.owner = 'DWH_STAGE_LAYER' and t.table_name like 'DS\_%' escape '\' then 'Y'
               when t.owner = 'DWH_EDW_LAYER' and regexp_like(t.table_name,'^DE_.+_[DFLTW]$') then 'Y'
               when t.owner = 'DWH_DATAMART_LAYER' and regexp_like(t.table_name,'^DM_.+_([DFLTW]|A[DMQY]?)$') then 'Y'
               else 'N'
           end as valid
    from   test t
    order by t.owner, t.table_name
    
    OWNER TABLE_NAME VALID
    DWH_DATAMART_LAYER DM_CONS_REV_F Y
    DWH_DATAMART_LAYER DM_CONS_REV_F34 N
    DWH_DATAMART_LAYER DM_ORDER_ENTRY_W Y
    DWH_DATAMART_LAYER DM_ORDER_ENTRY_W12 N
    DWH_DATAMART_LAYER DM_PROC_SALE_A Y
    DWH_DATAMART_LAYER DM_PROC_SALE_A12 N
    DWH_DATAMART_LAYER DM_PROC_SALE_AD Y
    DWH_DATAMART_LAYER DM_PROC_SALE_AD11 N
    DWH_DATAMART_LAYER DM_PROC_SALE_AM Y
    DWH_DATAMART_LAYER DM_PROC_SALE_AQ Y
    DWH_DATAMART_LAYER DM_PROC_SALE_AQ234 N
    DWH_DATAMART_LAYER DM_WHATEVER_AQ Y
    DWH_EDW_LAYER DE_EMP_POOL_T Y
    DWH_EDW_LAYER DE_EMP_POOL_T33 N
    DWH_EDW_LAYER DE_PROD_RECORD_D Y
    DWH_EDW_LAYER DE_PROD_RECORD_D123 N
    DWH_EDW_LAYER DM_WHATEVER_AQ N
    DWH_STAGE_LAYER DS_ANYTHING_F Y
    DWH_STAGE_LAYER DS_WHATEVER Y

    Note following comments:

    I still don't understand your business rules, but the regex for 'Begins with DS and ends with any of T, W, W01, W02, W03, W1, W2 or W3' can be written as

    ^DS_.+_(T|W|W01|W02|W03|W1|W2|W3)$
    

    which could be condensed to

    DS_.+_(T|(W0?[123]?))$
    

    If the prefix can be not just DS but also DSC or DSP with the same suffix rule, that would be

    ^DS[CP]?_.+_(T|(W0?[123]?))$
    

    If we want to add DT to the list, I make that:

    ^D(S[CP]?|T)_.+_(T|(W0?[123]?))$
    

    Alternatively, if you wanted to parse out the prefix and suffix into separate columns and use in lists to check them, I would group them together using something like:

    case
        when begins_with in ('DS', 'DSC', 'DSP', 'DT') and ends_with in ('T', 'W', 'W01', 'W02', 'W03', 'W1', 'W2' or 'W3') then 'Y'
        when begins_with in ('X', 'Y', 'Z') and ends with in ('BLAH1', 'BLAH2', 'BLAH3') then 'Y'
        when ...
        else 'N'
    end as valid