Let me explain with simple examples , basically we have some complex design of table naming conventions at least in stage schema there are 4 patterns.
Total LAYERS 4
All the below pattern specifically fall in STAGE Layer only
STAGE LAYER - 'DS_00','DSC_00','DSP_00','DT'
In this schema when looking for search pattern combination i need to consider the tables beginning with 'DS_00','DSC_00','DSP_00','DT' (the below explained patterns) and Ending with (T,W,W01,W02,W03,W1,W2,W3),so the table must be classified as STANADARD_NAMING_CONV else NON_STANDARD_NAMING_CONV
T - Technical W - Working Tables
Pattern 1
DS_00_to_99 (00_to_99 - is the number to identify which system the table is being loaded from source) DS_SYSTEM_NO_TABLENAME_ENDING SYSTEM_NO - 00_to_99 TABLENAME - full table name ENDING - not all cases the table is clssified as (T,W,W01,W02,W03,W1,W2,W3) so the numbers range as mentioned 00 to 99 is also same for other patterns 1 as well.
Pattern 2
DSC_00_to_99 (00_to_99 - is the number to identify which system the table is being loaded from source) DSC_SYSTEM_NO_TABLENAME_ENDING so the numbers range as mentioned 00 to 99 is also same for other patterns 2 as well.
Pattern 3
DSP_00_to_99 (00_to_99 - is the number to identify which system the table is being loaded from source) DS_SYSTEM_NO_TABLENAME_ENDING so the numbers range as mentioned 00 to 99 is also same for other patterns 3 as well.
Pattern 4
DT_ABC_FHSJ DS_TABLENAME_ENDING so the numbers range is not applicable for this pattern
BASIC DATA STORE LAYER - 'DS'
In this schema when looking for search pattern combination i need to consider the tables beginning with 'DB' and Ending with (D,F,L,T,W,W01,W02,W03,W1,W2,W3) ,so the table must be classified as STANADARD_NAMING_CONV else NON_STANDARD_NAMING_CONV
EDW LAYER - 'DE'
In this schema when looking for search pattern combination i need to consider the tables beginning with 'DB' and Ending with (D,F,L,T,W,W01,W02,W03,W1,W2,W3) ,so the table must be classified as STANADARD_NAMING_CONV else NON_STANDARD_NAMING_CONV and in this layer there are some other tables which begin with TBD_ must be classified as TO_BE_DROPPED and TMP_ must be classified as TEMPORARY_TABLE
MART LAYER - 'DM'
In this schema when looking for search pattern combination i need to consider the tables beginning with 'DB' and Ending with (D,F,L,T,W,W01,W02,W03,W1,W2,W3,A,AD,AM,AQ,AY) ,so the table must be classified as STANADARD_NAMING_CONV else NON_STANDARD_NAMING_CONV and in this layer there are some other tables which begin with TBD_ must be classified as TO_BE_DROPPED and TMP_ must be classified as TEMPORARY_TABLE
D
- Dimension tables
F
- Fact tables
L
- Lookup tables
T
- Technical tables
W
- Work tables
A
- Aggregate Fact Tables
AD
- Aggregate Fact Daily
AM
- Aggregate Fact Monthly
AQ
- Aggregate Fact Quarterly
AY
- Aggregate Fact Yearly
Query 1 - OLD Query
SELECT owner,
object_name,
beginning,
ending,
--count(*),
CASE
WHEN ( beginning, ending ) IN (
( 'DS', 'T' ),
( 'DS', 'W' ),
( 'DS', 'W01' ),
( 'DS', 'W02' ),
( 'DS', 'W03' ),
( 'DS', 'W1' ),
( 'DS', 'W2' ),
( 'DS', 'W3' ),
( 'DB', 'D' ),
( 'DB', 'F' ),
( 'DB', 'L' ),
( 'DB', 'T' ),
( 'DB', 'W' ),
( 'DB', 'W01' ),
( 'DB', 'W02' ),
( 'DB', 'W03' ),
( 'DB', 'W1' ),
( 'DB', 'W2' ),
( 'DB', 'W3' ),
( 'DE', 'D' ),
( 'DE', 'F' ),
( 'DE', 'L' ),
( 'DE', 'T' ),
( 'DE', 'W' ),
( 'DE', 'W01' ),
( 'DE', 'W02' ),
( 'DE', 'W03' ),
( 'DE', 'W1' ),
( 'DE', 'W2' ),
( 'DE', 'W3' ),
( 'DA', 'D' ),
( 'DA', 'F' ),
( 'DA', 'L' ),
( 'DA', 'T' ),
( 'DA', 'W' ),
( 'DA', 'W01' ),
( 'DA', 'W02' ),
( 'DA', 'W03' ),
( 'DA', 'W1' ),
( 'DA', 'W2' ),
( 'DA', 'W3' ),
( 'DA', 'A' ),
( 'DA', 'AD' ),
( 'DA', 'AM' ),
( 'DA', 'AQ' ),
( 'DA', 'AY' )
)
THEN 'STANDARD_NAMING_CONVENTION'
WHEN object_name LIKE 'TBD%'
THEN 'TO_BE_DROPPED'
WHEN object_name LIKE 'TMP%'
THEN 'TEMPORARY_TABLE'
ELSE 'NON_STANDARD_NAMING_CONVENTION'
END AS table_classification
FROM (
SELECT owner,
object_name,
CASE first_separator
WHEN 0
THEN NULL
ELSE SUBSTR( object_name, 1, first_separator - 1 )
END AS beginning,
CASE last_separator
WHEN 0
THEN NULL
ELSE SUBSTR( object_name, last_separator + 1 )
END AS ending
FROM (
SELECT owner,
object_name,
INSTR( object_name, '_', 1 ) AS first_separator,
INSTR( object_name, '_', -1 ) AS last_separator
FROM dba_objects
WHERE owner in ('DI_STAGE','DI_BDS','DI_EDW','DI_MART')
and object_type='TABLE'
group by owner,object_name
order by owner desc
)
);
Query 2 - Final Query - Worked
SELECT OWNER,
OBJECT_NAME,
REGEXP_REPLACE(OBJECT_NAME, '^([A-Z0-9$]{1,})_([A-Z0-9$]{1,})_.*','\1_\2')AS BEGINNING,
REGEXP_SUBSTR(OBJECT_NAME, '[^_]*$') AS ENDING,
CASE
WHEN REGEXP_LIKE(OBJECT_NAME, '^D(S[CP]?|T)_.+_(T|(W0?[123]?))$')
THEN
'STANDARD_NAMING_CONVENTION'
WHEN REGEXP_LIKE(OBJECT_NAME, '^DB_.+_(D|F|T|W|W1|W2|W3)$')
THEN
'STANDARD_NAMING_CONVENTION'
WHEN REGEXP_LIKE(OBJECT_NAME, '^DE_.+_(D|F|L|T|W|W01|W02|W03|W1|W2|W3)$')
THEN
'STANDARD_NAMING_CONVENTION'
WHEN OBJECT_NAME LIKE 'TBD%'
THEN
'TO_BE_DROPPED'
WHEN OBJECT_NAME LIKE 'TMP%'
THEN 'TEMPORARY_TABLE'
WHEN REGEXP_LIKE(OBJECT_NAME, '^DM_.+_(D|F|L|T|W|W01|W02|W03|W1|W2|W3|A|AD|AM|AQ|AY)$')
THEN
'STANDARD_NAMING_CONVENTION'
ELSE 'NON_STANDARD_NAMING_CONVENTION'
END
AS TABLE_CLASSIFICATION
FROM DBA_OBJECTS
WHERE OWNER IN ('DI_STAGE', 'DI_BDS', 'DI_EDW', 'DI_MART')
AND OBJECT_TYPE = 'TABLE'
GROUP BY OWNER,
OBJECT_NAME
ORDER BY OWNER DESC,
OBJECT_NAME;
EXPECTED RESULTS - Matched
SNO OWNER OBJECT_NAME BEGINNING ENDING TABLE_CLASSIFICATION
01 DI_BDS DB_PROD_DGGAA_D DB_PROD D STANDARD_NAMING_CONVENTION
02 DI_BDS DB_CUST_DHHA_F DB_CUST F STANDARD_NAMING_CONVENTION
03 DI_BDS DB_DHSHJA_HHSGS_T DB_DHSHJA T STANDARD_NAMING_CONVENTION
04 DI_BDS DB_DHS_DHHA_W DB_DHS W STANDARD_NAMING_CONVENTION
05 DI_BDS DB_GSG_DHHA_W01 DB_GSG W01 STANDARD_NAMING_CONVENTION
06 DI_BDS DB_GFS_FHSH_W02 DB_GFS W02 STANDARD_NAMING_CONVENTION
07 DI_BDS DB_FGS_FHS_W03 DB_FGS W03 STANDARD_NAMING_CONVENTION
08 DI_BDS DB_DJJ_GSA_W1 DB_DJJ W1 STANDARD_NAMING_CONVENTION
09 DI_BDS DB_DKS_SJ_W2 DB_DKS W2 STANDARD_NAMING_CONVENTION
10 DI_BDS DB_DJA_DT_W3 DB_DJA W3 STANDARD_NAMING_CONVENTION
11 DI_BDS DB_DHH_DG DB_DHH DG NON_STANDARD_NAMING_CONV
12 DI_BDS DB_DNS_DRS_123 DB_DNS 123 NON_STANDARD_NAMING_CONV
13 DI_BDS DB_FHD_DRS_1 DB_FHD 1 NON_STANDARD_NAMING_CONV
14 DI_BDS DB_OKS_DRS_0 DB_OKS 0 NON_STANDARD_NAMING_CONV
15 DI_BDS DB_SKG_DRS_90 DB_SKG 90 NON_STANDARD_NAMING_CONV
Finally worked
It seems like this is a pattern matching issue, but the patterns are also conditional on the table owner. I make it something like this:
with test (owner, table_name) as
( select 'DWH_STAGE_LAYER', 'DS_WHATEVER' from dual union all
select 'DWH_STAGE_LAYER', 'DS_ANYTHING_F' from dual union all
select 'DWH_DATAMART_LAYER', 'DM_CONS_REV_F' from dual union all
select 'DWH_DATAMART_LAYER', 'DM_CONS_REV_F34' from dual union all
select 'DWH_DATAMART_LAYER', 'DM_ORDER_ENTRY_W' from dual union all
select 'DWH_DATAMART_LAYER', 'DM_ORDER_ENTRY_W12' from dual union all
select 'DWH_DATAMART_LAYER', 'DM_PROC_SALE_A' from dual union all
select 'DWH_DATAMART_LAYER', 'DM_PROC_SALE_A12' from dual union all
select 'DWH_DATAMART_LAYER', 'DM_PROC_SALE_AD' from dual union all
select 'DWH_DATAMART_LAYER', 'DM_PROC_SALE_AD11' from dual union all
select 'DWH_DATAMART_LAYER', 'DM_PROC_SALE_AM' from dual union all
select 'DWH_DATAMART_LAYER', 'DM_PROC_SALE_AQ' from dual union all
select 'DWH_DATAMART_LAYER', 'DM_PROC_SALE_AQ234' from dual union all
select 'DWH_DATAMART_LAYER', 'DM_WHATEVER_AQ' from dual union all
select 'DWH_EDW_LAYER', 'DM_WHATEVER_AQ' from dual union all
select 'DWH_EDW_LAYER', 'DE_EMP_POOL_T' from dual union all
select 'DWH_EDW_LAYER', 'DE_EMP_POOL_T33' from dual union all
select 'DWH_EDW_LAYER', 'DE_PROD_RECORD_D' from dual union all
select 'DWH_EDW_LAYER', 'DE_PROD_RECORD_D123' from dual
)
select t.owner, t.table_name
, case
when t.owner = 'DWH_STAGE_LAYER' and t.table_name like 'DS\_%' escape '\' then 'Y'
when t.owner = 'DWH_EDW_LAYER' and regexp_like(t.table_name,'^DE_.+_[DFLTW]$') then 'Y'
when t.owner = 'DWH_DATAMART_LAYER' and regexp_like(t.table_name,'^DM_.+_([DFLTW]|A[DMQY]?)$') then 'Y'
else 'N'
end as valid
from test t
order by t.owner, t.table_name
OWNER | TABLE_NAME | VALID |
---|---|---|
DWH_DATAMART_LAYER | DM_CONS_REV_F | Y |
DWH_DATAMART_LAYER | DM_CONS_REV_F34 | N |
DWH_DATAMART_LAYER | DM_ORDER_ENTRY_W | Y |
DWH_DATAMART_LAYER | DM_ORDER_ENTRY_W12 | N |
DWH_DATAMART_LAYER | DM_PROC_SALE_A | Y |
DWH_DATAMART_LAYER | DM_PROC_SALE_A12 | N |
DWH_DATAMART_LAYER | DM_PROC_SALE_AD | Y |
DWH_DATAMART_LAYER | DM_PROC_SALE_AD11 | N |
DWH_DATAMART_LAYER | DM_PROC_SALE_AM | Y |
DWH_DATAMART_LAYER | DM_PROC_SALE_AQ | Y |
DWH_DATAMART_LAYER | DM_PROC_SALE_AQ234 | N |
DWH_DATAMART_LAYER | DM_WHATEVER_AQ | Y |
DWH_EDW_LAYER | DE_EMP_POOL_T | Y |
DWH_EDW_LAYER | DE_EMP_POOL_T33 | N |
DWH_EDW_LAYER | DE_PROD_RECORD_D | Y |
DWH_EDW_LAYER | DE_PROD_RECORD_D123 | N |
DWH_EDW_LAYER | DM_WHATEVER_AQ | N |
DWH_STAGE_LAYER | DS_ANYTHING_F | Y |
DWH_STAGE_LAYER | DS_WHATEVER | Y |
Note following comments:
I still don't understand your business rules, but the regex for 'Begins with DS and ends with any of T, W, W01, W02, W03, W1, W2 or W3' can be written as
^DS_.+_(T|W|W01|W02|W03|W1|W2|W3)$
which could be condensed to
DS_.+_(T|(W0?[123]?))$
If the prefix can be not just DS
but also DSC
or DSP
with the same suffix rule, that would be
^DS[CP]?_.+_(T|(W0?[123]?))$
If we want to add DT
to the list, I make that:
^D(S[CP]?|T)_.+_(T|(W0?[123]?))$
Alternatively, if you wanted to parse out the prefix and suffix into separate columns and use in
lists to check them, I would group them together using something like:
case
when begins_with in ('DS', 'DSC', 'DSP', 'DT') and ends_with in ('T', 'W', 'W01', 'W02', 'W03', 'W1', 'W2' or 'W3') then 'Y'
when begins_with in ('X', 'Y', 'Z') and ends with in ('BLAH1', 'BLAH2', 'BLAH3') then 'Y'
when ...
else 'N'
end as valid