sqloracle-databasepartition

How to use "Partition By" in Oracle SQL


I am new to use "PARTITION BY" in oracle. In the table structure that stored multiple addresses for a person as below, how to use "Partition By" in oracle SQL to

  1. get the count of record per type per person and
  2. get the latest Addr_ID per person per type
PERSON_ID ADDR_ID TYPE EFF_START_DATE EFF_END_DATE
6207445 7390814 HOME 2024-02-26 4712-12-31
6207445 0042548 HOME 2023-05-15 4712-12-31
6207445 6462255 HOME 2022-01-31 4712-12-31
6207445 6462287 MAIL 2022-01-31 4712-12-31
6207445 0042571 MAIL 2023-05-15 4712-12-31
1111111 0042578 MAIL 2023-05-15 4712-12-31

Expected Result:

Person_ID Type Rec_Count
6207445 HOME 3
6207445 MAIL 2
1111111 MAIL 1

However, I run the script as below and it gave me different result

SELECT person_id, type, 
      COUNT(*) OVER (PARTITION BY person_id) AS rec_count
FROM PER_PERSON_ADDR_USAGES_F 
WHERE
    person_id = '6207445'
ORDER BY rec_count desc
PERSON_ID TYPE REC_COUNT
6207445 HOME 5
6207445 MAIL 5
6207445 HOME 5
6207445 MAIL 5
6207445 HOME 5

What's wrong in my SQL?

Besides, What can I do if I want to get the latest Addr_ID per person per type?

SELECT person_id, type, 
    COUNT(*) OVER (PARTITION BY person_id) AS rec_count
FROM PER_PERSON_ADDR_USAGES_F 
WHERE
    to_number(rn) > 1
ORDER BY rec_count desc

I try this but failed as below:

Query Execution started at Mon Feb 26 15:58:14 AEDT 2024
Error while executing the query:
ORA-06550: line 15, column 13:
PL/SQL: ORA-00904: "REC_COUNT": invalid identifier
ORA-06550: line 7, column 17:
PL/SQL: SQL Statement ignored

Any idea what's wrong in my SQL?


Solution

  • Analytic functions work over the entire result set and return exactly the same number of rows of output as there are of input; if you specify PARTITION BY then the function will split the result set up into different partitions and calculate the result within that partition but there will still be a one-to-one correspondence between the input and output rows.

    If you want to aggregate so that you only produce one row per partition then do not use analytic functions and use GROUP BY:

    SELECT person_id,
           type, 
           COUNT(*) AS rec_count
    FROM   PER_PERSON_ADDR_USAGES_F 
    GROUP BY person_id, type
    ORDER BY rec_count DESC
    

    Which, for the sample data:

    CREATE TABLE PER_PERSON_ADDR_USAGES_F (PERSON_ID, ADDR_ID, TYPE, EFF_START_DATE, EFF_END_DATE) AS
    SELECT 6207445, '7390814', 'HOME', DATE '2024-02-26', DATE '4712-12-31' FROM DUAL UNION ALL
    SELECT 6207445, '0042548', 'HOME', DATE '2023-05-15', DATE '4712-12-31' FROM DUAL UNION ALL
    SELECT 6207445, '6462255', 'HOME', DATE '2022-01-31', DATE '4712-12-31' FROM DUAL UNION ALL
    SELECT 6207445, '6462287', 'MAIL', DATE '2022-01-31', DATE '4712-12-31' FROM DUAL UNION ALL
    SELECT 6207445, '0042571', 'MAIL', DATE '2023-05-15', DATE '4712-12-31' FROM DUAL UNION ALL
    SELECT 1111111, '0042578', 'MAIL', DATE '2023-05-15', DATE '4712-12-31' FROM DUAL;
    

    Outputs:

    PERSON_ID TYPE REC_COUNT
    6207445 HOME 3
    6207445 MAIL 2
    1111111 MAIL 1

    If you want to find the latest address then use KEEP (DENSE_RANK LAST ...):

    SELECT person_id,
           type, 
           MAX(addr_id) KEEP (DENSE_RANK LAST ORDER BY eff_start_date) AS addr_id,
           COUNT(*) AS rec_count
    FROM   PER_PERSON_ADDR_USAGES_F 
    GROUP BY person_id, type
    ORDER BY rec_count DESC
    

    Which, for the sample data, outputs:

    PERSON_ID TYPE ADDR_ID REC_COUNT
    6207445 HOME 7390814 3
    6207445 MAIL 0042571 2
    1111111 MAIL 0042578 1

    Alternatively, use analytic functions and then filter to only get the first row of each partition:

    SELECT person_id,
           type, 
           addr_id,
           rec_count
    FROM   (
      SELECT person_id,
             type, 
             addr_id,
             ROW_NUMBER() OVER (PARTITION BY person_id, type ORDER BY eff_start_date DESC)
               AS rn,
             COUNT(*) OVER (PARTITION BY person_id, type) AS rec_count
      FROM   PER_PERSON_ADDR_USAGES_F 
    )
    WHERE  rn = 1
    ORDER BY rec_count DESC
    

    Which outputs the same as the previous query.

    fiddle