sqlpostgresqlpivot-table

SQL Crosstab not showing correct counts


I am trying to create a crosstab that gives me a count of crash type by injury severity, and the data is in three tables (crashes, occupants & pedestrians). The occupants & pedestrians tables have a many-to-one relationship with crashes since there is typically more than one person involved in a crash. I made a view to handle assigning the crash severity to each crash_id based on the occupants and pedestrians tables. Now when I run the query below, the total count is correct, but the counts in each category are not always correct.

SELECT *
FROM crosstab(
    'SELECT CONCAT(ct.crash_type_code, '' - '', ct.crash_type_desc) AS crash_type, 
            cs.crash_severity, 
            COALESCE(count(cs.crash_severity), 0) AS count 
    FROM (
        SELECT DISTINCT crash_type_code, crash_type_desc FROM crash_type_lookup
    ) ct
    CROSS JOIN (
        SELECT DISTINCT crash_severity FROM crash_severity
    ) cs
    LEFT JOIN crash_severity ON cs.crash_severity = crash_severity.crash_severity
                              AND crash_severity.crash_type = ct.crash_type_code
    GROUP BY ct.crash_type_code, ct.crash_type_desc, cs.crash_severity
    ORDER BY 1,2'
)
AS CrashType(crash_type text,
            "01 - Fatal Injury" bigint,
            "02 - Suspected Serious Injury" bigint,
            "03 - Suspected Minor Injury" bigint,
            "04 - Possible Injury" bigint,
            "Property Damage Only" bigint);

Will return this:

crash_type 01 - Fatal Injury 02 - Suspected Serious Injury 03 - Suspected Minor Injury 04 - Possible Injury Property Damage Only
00 - Unknown 1 1 24
01 - Same Direction (Rear End) 9 40 534 2784 10678
02 - Same Direction (Side Swipe) 5 12 137 525 5725
03 - Right Angle 10 41 540 1785 4825
04 - Opposite Direction (Head On, Angular) 8 19 131 322 645
05 - Opposite Direction (Side Swipe) 3 3 30 60 474
06 - Struck Parked Vehicle 4 20 115 263 5763
07 - Left Turn/U Turn 5 7 125 249 686
08 - Backing 6 46 1559
09 - Encroachment 12 17 154
10 - Overturned 2 8 87 62 86
11 - Fixed Object 33 95 585 880 4810
12 - Animal 36 57 2247
13 - Pedestrian 55 46 205 292 92
14 - Pedacyclist 6 17 121 108 52
15 - Non-Fixed Object 1 5 20 18 511
16 - Railcar - Vehicle 2
99 - Other 2 3 36 20 288

But row 1, the 00 - Unknown category, should be null for the first two columns (Fatal Injury and Suspected Serious Injury) and 1, 1, 24 for the last three columns. So it seems it's just not accounting for the fact that the first two columns are null, and I can't figure out what I'm doing wrong.

Here is a sample of the data showing how the 00 - Unknown crash types should be distributed.

mercer_crashes=# SELECT c.crash_id, crash_type_code, crash_severity FROM crashes c LEFT JOIN crash_severity cs ON c.crash_id = cs.crash_id WHERE crash_type_code = '00';

crash_id crash_type_code crash_severity
2019110319-29497 00 03 - Suspected Minor Injury
2017111117-004544 00 04 - Possible Injury
2017110317-36014 00 Property Damage Only
2017110717-38624-AC 00 Property Damage Only
2017111117-006380 00 Property Damage Only
2017111117-006610 00 Property Damage Only
2017111117001399 00 Property Damage Only
2017111117001803 00 Property Damage Only
2017111417-19645 00 Property Damage Only
2017111417-25979 00 Property Damage Only
201811022018-01727 00 Property Damage Only
2018110318-28221 00 Property Damage Only
2018111118004932 00 Property Damage Only
2018111118004944 00 Property Damage Only
2018111118012298 00 Property Damage Only
2018111118014205 00 Property Damage Only
2018111418-17386 00 Property Damage Only
2020110120-9355 00 Property Damage Only
2020110320-25261 00 Property Damage Only
2020110320-30145 00 Property Damage Only
2020111120003327 00 Property Damage Only
2020111120003940 00 Property Damage Only
2020111420-24013 00 Property Damage Only
2021110321-078180 00 Property Damage Only
2021111121006276 00 Property Damage Only
2021111421-13746 00 Property Damage Only

Other areas are probably incorrect also, I haven't checked everything since its obvious it's not working right. Any help would be much appreciated.

Here is a sample of the crash_severity view:

crash_id crash_severity
2017110114-360 Property Damage Only
2017110117-10026 Property Damage Only
2017110117-10075 Property Damage Only
2017110117-10209 04 - Possible Injury
2017110117-1022 Property Damage Only
2017110117-10231 Property Damage Only
2017110117-10253 03 - Suspected Minor Injury
2017110117-10266 Property Damage Only
2017110117-1029 04 - Possible Injury
2017110117-10318 Property Damage Only
2017110117-10329 03 - Suspected Minor Injury
2017110117-10363 Property Damage Only
2017110117-10386 Property Damage Only
2017110117-10392 04 - Possible Injury
2017110117-1040 Property Damage Only
2017110117-10451 04 - Possible Injury
2017110117-10462 Property Damage Only
2017110117-10507 Property Damage Only
2017110117-10545 Property Damage Only
2017110117-10632 Property Damage Only
2017110117-1067 04 - Possible Injury
2017110117-10678 Property Damage Only
2017110117-10714 Property Damage Only
2017110117-10715 Property Damage Only
2017110117-10717 04 - Possible Injury
2017110117-10756 Property Damage Only

Solution

  • it seems it's just not accounting for the fact that the first two columns are null

    Yes, it is not.

    I can't figure out what I'm doing wrong

    You COUNT the data from dimension. They are always non-null. You should COUNT the facts.

    This should work correctly:

    SELECT *
    FROM crosstab(
        'SELECT CONCAT(ct.crash_type_code, '' - '', ct.crash_type_desc) AS crash_type, 
                cs.crash_severity, 
                COALESCE(count(crash_severity.crash_severity), 0) AS count 
        FROM (
            SELECT DISTINCT crash_type_code, crash_type_desc FROM crash_type_lookup
        ) ct
        CROSS JOIN (
            SELECT DISTINCT crash_severity FROM crash_severity
        ) cs
        LEFT JOIN crash_severity ON cs.crash_severity = crash_severity.crash_severity
                                  AND crash_severity.crash_type = ct.crash_type_code
        GROUP BY ct.crash_type_code, ct.crash_type_desc, cs.crash_severity
        ORDER BY 1,2'
    )
    AS CrashType(crash_type text,
                "01 - Fatal Injury" bigint,
                "02 - Suspected Serious Injury" bigint,
                "03 - Suspected Minor Injury" bigint,
                "04 - Possible Injury" bigint,
                "Property Damage Only" bigint);