I am new to use "PARTITION BY" in oracle. In the table structure that stored multiple addresses for a person as below, how to use "Partition By" in oracle SQL to
PERSON_ID | ADDR_ID | TYPE | EFF_START_DATE | EFF_END_DATE |
---|---|---|---|---|
6207445 | 7390814 | HOME | 2024-02-26 | 4712-12-31 |
6207445 | 0042548 | HOME | 2023-05-15 | 4712-12-31 |
6207445 | 6462255 | HOME | 2022-01-31 | 4712-12-31 |
6207445 | 6462287 | 2022-01-31 | 4712-12-31 | |
6207445 | 0042571 | 2023-05-15 | 4712-12-31 | |
1111111 | 0042578 | 2023-05-15 | 4712-12-31 |
Expected Result:
Person_ID | Type | Rec_Count |
---|---|---|
6207445 | HOME | 3 |
6207445 | 2 | |
1111111 | 1 |
However, I run the script as below and it gave me different result
SELECT person_id, type,
COUNT(*) OVER (PARTITION BY person_id) AS rec_count
FROM PER_PERSON_ADDR_USAGES_F
WHERE
person_id = '6207445'
ORDER BY rec_count desc
PERSON_ID | TYPE | REC_COUNT |
---|---|---|
6207445 | HOME | 5 |
6207445 | 5 | |
6207445 | HOME | 5 |
6207445 | 5 | |
6207445 | HOME | 5 |
What's wrong in my SQL?
Besides, What can I do if I want to get the latest Addr_ID per person per type?
SELECT person_id, type,
COUNT(*) OVER (PARTITION BY person_id) AS rec_count
FROM PER_PERSON_ADDR_USAGES_F
WHERE
to_number(rn) > 1
ORDER BY rec_count desc
I try this but failed as below:
Query Execution started at Mon Feb 26 15:58:14 AEDT 2024
Error while executing the query:
ORA-06550: line 15, column 13:
PL/SQL: ORA-00904: "REC_COUNT": invalid identifier
ORA-06550: line 7, column 17:
PL/SQL: SQL Statement ignored
Any idea what's wrong in my SQL?
Analytic functions work over the entire result set and return exactly the same number of rows of output as there are of input; if you specify PARTITION BY
then the function will split the result set up into different partitions and calculate the result within that partition but there will still be a one-to-one correspondence between the input and output rows.
If you want to aggregate so that you only produce one row per partition then do not use analytic functions and use GROUP BY
:
SELECT person_id,
type,
COUNT(*) AS rec_count
FROM PER_PERSON_ADDR_USAGES_F
GROUP BY person_id, type
ORDER BY rec_count DESC
Which, for the sample data:
CREATE TABLE PER_PERSON_ADDR_USAGES_F (PERSON_ID, ADDR_ID, TYPE, EFF_START_DATE, EFF_END_DATE) AS
SELECT 6207445, '7390814', 'HOME', DATE '2024-02-26', DATE '4712-12-31' FROM DUAL UNION ALL
SELECT 6207445, '0042548', 'HOME', DATE '2023-05-15', DATE '4712-12-31' FROM DUAL UNION ALL
SELECT 6207445, '6462255', 'HOME', DATE '2022-01-31', DATE '4712-12-31' FROM DUAL UNION ALL
SELECT 6207445, '6462287', 'MAIL', DATE '2022-01-31', DATE '4712-12-31' FROM DUAL UNION ALL
SELECT 6207445, '0042571', 'MAIL', DATE '2023-05-15', DATE '4712-12-31' FROM DUAL UNION ALL
SELECT 1111111, '0042578', 'MAIL', DATE '2023-05-15', DATE '4712-12-31' FROM DUAL;
Outputs:
PERSON_ID | TYPE | REC_COUNT |
---|---|---|
6207445 | HOME | 3 |
6207445 | 2 | |
1111111 | 1 |
If you want to find the latest address then use KEEP (DENSE_RANK LAST ...)
:
SELECT person_id,
type,
MAX(addr_id) KEEP (DENSE_RANK LAST ORDER BY eff_start_date) AS addr_id,
COUNT(*) AS rec_count
FROM PER_PERSON_ADDR_USAGES_F
GROUP BY person_id, type
ORDER BY rec_count DESC
Which, for the sample data, outputs:
PERSON_ID | TYPE | ADDR_ID | REC_COUNT |
---|---|---|---|
6207445 | HOME | 7390814 | 3 |
6207445 | 0042571 | 2 | |
1111111 | 0042578 | 1 |
Alternatively, use analytic functions and then filter to only get the first row of each partition:
SELECT person_id,
type,
addr_id,
rec_count
FROM (
SELECT person_id,
type,
addr_id,
ROW_NUMBER() OVER (PARTITION BY person_id, type ORDER BY eff_start_date DESC)
AS rn,
COUNT(*) OVER (PARTITION BY person_id, type) AS rec_count
FROM PER_PERSON_ADDR_USAGES_F
)
WHERE rn = 1
ORDER BY rec_count DESC
Which outputs the same as the previous query.