I am trying to create a crosstab that gives me a count of crash type by injury severity, and the data is in three tables (crashes, occupants & pedestrians). The occupants & pedestrians tables have a many-to-one relationship with crashes since there is typically more than one person involved in a crash. I made a view to handle assigning the crash severity to each crash_id based on the occupants and pedestrians tables. Now when I run the query below, the total count is correct, but the counts in each category are not always correct.
SELECT *
FROM crosstab(
'SELECT CONCAT(ct.crash_type_code, '' - '', ct.crash_type_desc) AS crash_type,
cs.crash_severity,
COALESCE(count(cs.crash_severity), 0) AS count
FROM (
SELECT DISTINCT crash_type_code, crash_type_desc FROM crash_type_lookup
) ct
CROSS JOIN (
SELECT DISTINCT crash_severity FROM crash_severity
) cs
LEFT JOIN crash_severity ON cs.crash_severity = crash_severity.crash_severity
AND crash_severity.crash_type = ct.crash_type_code
GROUP BY ct.crash_type_code, ct.crash_type_desc, cs.crash_severity
ORDER BY 1,2'
)
AS CrashType(crash_type text,
"01 - Fatal Injury" bigint,
"02 - Suspected Serious Injury" bigint,
"03 - Suspected Minor Injury" bigint,
"04 - Possible Injury" bigint,
"Property Damage Only" bigint);
Will return this:
crash_type | 01 - Fatal Injury | 02 - Suspected Serious Injury | 03 - Suspected Minor Injury | 04 - Possible Injury | Property Damage Only |
---|---|---|---|---|---|
00 - Unknown | 1 | 1 | 24 | ||
01 - Same Direction (Rear End) | 9 | 40 | 534 | 2784 | 10678 |
02 - Same Direction (Side Swipe) | 5 | 12 | 137 | 525 | 5725 |
03 - Right Angle | 10 | 41 | 540 | 1785 | 4825 |
04 - Opposite Direction (Head On, Angular) | 8 | 19 | 131 | 322 | 645 |
05 - Opposite Direction (Side Swipe) | 3 | 3 | 30 | 60 | 474 |
06 - Struck Parked Vehicle | 4 | 20 | 115 | 263 | 5763 |
07 - Left Turn/U Turn | 5 | 7 | 125 | 249 | 686 |
08 - Backing | 6 | 46 | 1559 | ||
09 - Encroachment | 12 | 17 | 154 | ||
10 - Overturned | 2 | 8 | 87 | 62 | 86 |
11 - Fixed Object | 33 | 95 | 585 | 880 | 4810 |
12 - Animal | 36 | 57 | 2247 | ||
13 - Pedestrian | 55 | 46 | 205 | 292 | 92 |
14 - Pedacyclist | 6 | 17 | 121 | 108 | 52 |
15 - Non-Fixed Object | 1 | 5 | 20 | 18 | 511 |
16 - Railcar - Vehicle | 2 | ||||
99 - Other | 2 | 3 | 36 | 20 | 288 |
But row 1, the 00 - Unknown category, should be null for the first two columns (Fatal Injury and Suspected Serious Injury) and 1, 1, 24 for the last three columns. So it seems it's just not accounting for the fact that the first two columns are null, and I can't figure out what I'm doing wrong.
Here is a sample of the data showing how the 00 - Unknown crash types should be distributed.
mercer_crashes=# SELECT c.crash_id, crash_type_code, crash_severity FROM crashes c LEFT JOIN crash_severity cs ON c.crash_id = cs.crash_id WHERE crash_type_code = '00';
crash_id | crash_type_code | crash_severity |
---|---|---|
2019110319-29497 | 00 | 03 - Suspected Minor Injury |
2017111117-004544 | 00 | 04 - Possible Injury |
2017110317-36014 | 00 | Property Damage Only |
2017110717-38624-AC | 00 | Property Damage Only |
2017111117-006380 | 00 | Property Damage Only |
2017111117-006610 | 00 | Property Damage Only |
2017111117001399 | 00 | Property Damage Only |
2017111117001803 | 00 | Property Damage Only |
2017111417-19645 | 00 | Property Damage Only |
2017111417-25979 | 00 | Property Damage Only |
201811022018-01727 | 00 | Property Damage Only |
2018110318-28221 | 00 | Property Damage Only |
2018111118004932 | 00 | Property Damage Only |
2018111118004944 | 00 | Property Damage Only |
2018111118012298 | 00 | Property Damage Only |
2018111118014205 | 00 | Property Damage Only |
2018111418-17386 | 00 | Property Damage Only |
2020110120-9355 | 00 | Property Damage Only |
2020110320-25261 | 00 | Property Damage Only |
2020110320-30145 | 00 | Property Damage Only |
2020111120003327 | 00 | Property Damage Only |
2020111120003940 | 00 | Property Damage Only |
2020111420-24013 | 00 | Property Damage Only |
2021110321-078180 | 00 | Property Damage Only |
2021111121006276 | 00 | Property Damage Only |
2021111421-13746 | 00 | Property Damage Only |
Other areas are probably incorrect also, I haven't checked everything since its obvious it's not working right. Any help would be much appreciated.
Here is a sample of the crash_severity view:
crash_id | crash_severity |
---|---|
2017110114-360 | Property Damage Only |
2017110117-10026 | Property Damage Only |
2017110117-10075 | Property Damage Only |
2017110117-10209 | 04 - Possible Injury |
2017110117-1022 | Property Damage Only |
2017110117-10231 | Property Damage Only |
2017110117-10253 | 03 - Suspected Minor Injury |
2017110117-10266 | Property Damage Only |
2017110117-1029 | 04 - Possible Injury |
2017110117-10318 | Property Damage Only |
2017110117-10329 | 03 - Suspected Minor Injury |
2017110117-10363 | Property Damage Only |
2017110117-10386 | Property Damage Only |
2017110117-10392 | 04 - Possible Injury |
2017110117-1040 | Property Damage Only |
2017110117-10451 | 04 - Possible Injury |
2017110117-10462 | Property Damage Only |
2017110117-10507 | Property Damage Only |
2017110117-10545 | Property Damage Only |
2017110117-10632 | Property Damage Only |
2017110117-1067 | 04 - Possible Injury |
2017110117-10678 | Property Damage Only |
2017110117-10714 | Property Damage Only |
2017110117-10715 | Property Damage Only |
2017110117-10717 | 04 - Possible Injury |
2017110117-10756 | Property Damage Only |
it seems it's just not accounting for the fact that the first two columns are null
Yes, it is not.
I can't figure out what I'm doing wrong
You COUNT the data from dimension. They are always non-null. You should COUNT the facts.
This should work correctly:
SELECT *
FROM crosstab(
'SELECT CONCAT(ct.crash_type_code, '' - '', ct.crash_type_desc) AS crash_type,
cs.crash_severity,
COALESCE(count(crash_severity.crash_severity), 0) AS count
FROM (
SELECT DISTINCT crash_type_code, crash_type_desc FROM crash_type_lookup
) ct
CROSS JOIN (
SELECT DISTINCT crash_severity FROM crash_severity
) cs
LEFT JOIN crash_severity ON cs.crash_severity = crash_severity.crash_severity
AND crash_severity.crash_type = ct.crash_type_code
GROUP BY ct.crash_type_code, ct.crash_type_desc, cs.crash_severity
ORDER BY 1,2'
)
AS CrashType(crash_type text,
"01 - Fatal Injury" bigint,
"02 - Suspected Serious Injury" bigint,
"03 - Suspected Minor Injury" bigint,
"04 - Possible Injury" bigint,
"Property Damage Only" bigint);