sqlpostgresqlaverageaggregate-filter

How to find the average of a subset of values from a table in PostgreSQL?


Forgive me but I am new to PostgreSQL and have been tasked with updating some fields in some tables. One particular field is the average decision time shown below:

CASE WHEN COUNT(tdrm.dbid) > 0
THEN TO_CHAR((AVG(tdrm.total_processing_time) || ' millisecond')::interval, 'MI:SS.MS')
ELSE '00:00.000'
END AS average_decision_time

Where COUNT(tdrm.dbid) is items_seen. The issue with this logic is that we want to exclude the total processing time from the average for items that have an abort flag equal to 'AF_ABORT'.

This is what I am trying to do:

CASE WHEN COUNT(tdrm.dbid) > 0
THEN TO_CHAR((AVG(COUNT(CASE WHEN tdrm.tdr_abort_flag!=AF_ABORT THEN tdrm.total_processing_time END)) || ' millisecond')::interval, 'MI:SS.MS')
ELSE '00:00.000'
END AS average_decision_time

But I am getting the error below :

ERROR: aggregate function calls cannot be nested LINE 64: THEN TO_CHAR((AVG(COUNT(CASE WHEN tdrm.tdr_abort_flag!=A...

Am I on the right track or is there a simpler way of doing this?

Full SQL below:

SELECT s.*,
CASE WHEN agent_event.event_code = 'data_download' THEN 'DL'
WHEN agent_event.event_code = 'mode' THEN 'Mode'
ELSE agent_event.event_code
END AS userAction
FROM
(
WITH report_constants AS (
-- Decisions from DetectionReport.h
SELECT
0::int as AD_UNKNOWN,
1::int as AD_ALARM,
2::int as AD_CLEAR,
-- Flags from DetectionReport.h
0::int as AF_UNKNOWN,
1::int as AF_ABORT,
2::int as AF_SUCCESS,
-- UI values for Decisions are DIFFERENT
0::int as UI_AD_ALL,
1::int as UI_AD_CLEAR,
2::int as UI_AD_ALARM,
3::int as UI_AD_UNKNOWN,
--
0::int as AGENT_TYPE_SCANNER,
1::int as AGENT_TYPE_OSR,
2::int as AGENT_TYPE_DIVERTER,
3::int as AGENT_TYPE_TIP,
4::int as AGENT_TYPE_SEARCH,
-- Operation Mode from Module.h
0::int as OPERATION_MODE_UNKNOWN,
1::int as OPERATION_MODE_SCAN,
2::int as OPERATION_MODE_OTHER
)
SELECT
nss_user.username AS user_name,
reg_login.action_time AS login_action_time,
reg_logout.action_time AS logout_action_time,
to_char(reg_login.action_time, 'MM-DD-YYYY') AS login_date,
to_char(reg_login.action_time, 'HH24:MI:SS') AS login_time,
CASE WHEN reg_logout.action_time IS NULL THEN '' ELSE 
to_char(reg_logout.action_time, 'MM-DD-YYYY') END AS logout_date,
CASE WHEN reg_logout.action_time IS NULL THEN '' ELSE 
to_char(reg_logout.action_time, 'HH24:MI:SS') END AS logout_time,
CASE WHEN user_level.name LIKE 'Level %' THEN SUBSTRING(user_level.name from 7) ELSE user_level.name END AS userAccess,
COUNT(tdrm.dbid) AS items_seen,
CASE WHEN COUNT(tdrm.dbid) > 0
THEN ROUND(100.0 * COUNT(CASE WHEN tdrm.tdr_abort_flag=AF_SUCCESS
  AND tdrm.tdr_alarm_decision=AD_CLEAR THEN 1 END) / COUNT(tdrm.dbid), 2)
ELSE 0.00
END AS clear_rate,
COUNT(CASE WHEN (tdrm.tdr_abort_flag=AF_SUCCESS 
AND tdrm.tdr_alarm_decision=AD_UNKNOWN) 
  OR tdrm.tdr_abort_flag=AF_ABORT THEN 1 END) AS operator_timeout,
CASE WHEN COUNT(tdrm.dbid) > 0
THEN ROUND(100.0 * COUNT(CASE WHEN tdrm.tdr_abort_flag=AF_SUCCESS
  AND tdrm.tdr_alarm_decision=AD_ALARM THEN 1 END) / COUNT(tdrm.dbid), 2)
ELSE 0.00
END AS suspect_rate,
CASE WHEN COUNT(tdrm.dbid) > 0
THEN ROUND(100.0 * COUNT(CASE WHEN 
  (tdrm.tdr_abort_flag=AF_SUCCESS AND tdrm.tdr_alarm_decision=AD_UNKNOWN) 
    OR tdrm.tdr_abort_flag=AF_ABORT THEN 1 END) / COUNT(tdrm.dbid), 2)
ELSE 0.00
END AS operatorNoDecisionRate,
CASE WHEN COUNT(tdrm.dbid) > 0
THEN TO_CHAR((AVG(CASE WHEN tdrm.tdr_abort_flag!=AF_ABORT THEN (tdrm.total_processing_time) END) || ' millisecond')::interval, 'MI:SS.MS')
ELSE '00:00.000'
END AS average_decision_time
v2_module.dbid AS v2_gem_dbid
FROM report_constants CROSS JOIN auth_event
INNER JOIN registration_event AS reg_login
ON reg_login.credential_id=auth_event.credential_id
AND reg_login.event_type=3
LEFT OUTER JOIN registration_event AS reg_logout
ON reg_logout.credential_id=auth_event.credential_id
AND reg_logout.event_type=4
INNER JOIN nss_user ON nss_user.dbid=auth_event.nss_user_dbid
INNER JOIN user_level ON user_level.dbid=nss_user.user_level_dbid
LEFT OUTER JOIN bag_tdr ON nss_user.dbid=bag_tdr.author_user_dbid
AND (item_tdr.agent_type=AGENT_TYPE_OSR OR 
item_tdr.agent_type=AGENT_TYPE_SEARCH)
AND item_tdr.author_credential_id=auth_event.credential_id
LEFT OUTER JOIN v2_module AS tdrm ON 
item_tdr.v2_module_dbid=tdrm.dbid 
LEFT OUTER JOIN v2_general_equipment_module
ON v2_general_equipment_module.dbid=reg_login.v2_gem_dbid
WHERE auth_event.credential_id IS NOT NULL
AND auth_event.auth_event_type=1
AND ($P{userid} = 'ALL' OR $P{userid} = nss_user.username)
AND item_tdr.created_date >= $P{fromdate}
AND item_tdr.created_date <= $P{todate}
AND v2_module.operation_mode != OPERATION_MODE_OTHER
GROUP BY nss_user.username, user_level.name, reg_login.agent_type, 
reg_login.action_time, reg_logout.action_time, 
v2_module.dbid
) s
LEFT OUTER JOIN agent_event
ON s.v2_dbid=agent_event.v2_dbid
AND agent_event.event_timestamp >= s.login_action_time
AND (s.logout_action_time IS NULL OR agent_event.event_timestamp <= s.logout_action_time)
ORDER BY s.login_action_time

Solution

  • we want to exclude the total processing time from the average for items that have an abort flag equal to 'AF_ABORT'.

    CASE WHEN count(tdrm.dbid) > 0
       THEN to_char(avg(tdrm.total_processing_time)
                       FILTER (WHERE tdrm.tdr_abort_flag IS DISTINCT FROM 'AF_ABORT') -- ①, ②
                  * interval '1 millisecond'  -- ③
                  , 'MI:SS.MS')
       ELSE '00:00.000'
    END AS average_decision_time
    

    ① Key element to implement your filter is the aggregate FILTER clause. See:

    ② If tdrm.tdr_abort_flag can be NULL (missing info), we need tdrm.tdr_abort_flag IS DISTINCT FROM 'AF_ABORT'. Else we can simplify to tdrm.tdr_abort_flag <> 'AF_ABORT'.

    ③ Multiplication is substantially faster than concatenation plus cast.

    But after adding a FILTER like this, the expression can produce NULL values after all. Your requirements are fuzzy. You may really want:

    Average of total_processing_time where tdr_abort_flag <> 'AF_ABORT'. Default to 0 if result is NULL for any reason:

    COALESCE(to_char(avg(tdrm.total_processing_time) FILTER (WHERE tdrm.tdr_abort_flag <> 'AF_ABORT')
                   * interval '1 millisecond'
                   , 'MI:SS.MS')
           , '00:00.000') AS average_decision_time
    

    Or:

    Average of total_processing_time where tdr_abort_flag <> 'AF_ABORT'. But only if count(tdrm.dbid) > 0. Default to 0 if result is NULL for any reason:

    CASE WHEN count(tdrm.dbid) > 0
       THEN COALESCE(to_char(avg(tdrm.total_processing_time) FILTER (WHERE tdrm.tdr_abort_flag <> 'AF_ABORT')
                           * interval '1 millisecond'
                           , 'MI:SS.MS')
                   , '00:00.000')
       ELSE '00:00.000'
    END AS average_decision_time
    

    Or:

    Average of total_processing_time where tdr_abort_flag <> 'AF_ABORT'. But only if count(tdrm.dbid) > 0 where tdr_abort_flag <> 'AF_ABORT'. Else default to 0

    CASE WHEN count(tdrm.dbid) FILTER (WHERE tdrm.tdr_abort_flag <> 'AF_ABORT') > 0
       THEN to_char(avg(tdrm.total_processing_time) FILTER (WHERE tdrm.tdr_abort_flag <> 'AF_ABORT')
                  * interval '1 millisecond'
                  , 'MI:SS.MS')
       ELSE '00:00.000'
    END AS average_decision_time
    

    You mentioned that you are "new to SQL". Let me add this: a crystal-clear definition of the problem is > 50 % of the solution. True in many areas, but certainly with SQL.