hadoophivehiveqlquery-performanceapache-tez

Reduce the execution time of large query


My query takes 30+ minutes to process this query. It indeed works on very large data set, however I may be missing something basic which can reduce the execution time.

Query works on many reducer stages each using 1000+ reducers. Runs on Tez engine.

I tried to enable CBO but no luck, also tried to limit reducers to 500 but execution time remains high.

select itt.tr_date, sum (bkt_sum_pc) as pts 
from itops_trxn itt,
( select acttrxnID, max(act_cmp_id) as act_cmp_id 
   from itops_trxn_act a, ll_act_act_trxn b where a.act_trxn_ID = b.ACOUNTtrxnID group by  acttrxnID 
) A, 
(select cmp_id, max (cmp_name) as name 
   from itops_offer group by  cmp_id
) c 
where itt.acttrxnID = A.acttrxnID and act_cmp_id = c.cmp_id
and itt.type = 'ajstmnt' 
and itt.event_header_event_name NOT IN ('composite.sys.act.merge', 'pos.sys.identity', 'composite.sys.act.pcmerge') 
and itt.event_atomic_operation_type  = 'CT'
and itt.tr_date >='2018-10-31' 
group by itt.tr_date, channel, location_storeparentid, meta_trxnreason,  act_cmp_id,name; 

Solution

  • Rewrite joins explicitly and move these conditions

    where itt.acttrxnID = A.acttrxnID and act_cmp_id = c.cmp_id

    to the join ON clause:

    select itt.tr_date, sum (bkt_sum_pc) as pts 
    from itops_trxn itt
    INNER JOIN
    ( select acttrxnID, max(act_cmp_id) as act_cmp_id 
       from itops_trxn_act a, ll_act_act_trxn b where a.act_trxn_ID = b.ACOUNTtrxnID group by  acttrxnID 
    ) A           ON itt.acttrxnID = A.acttrxnID
    INNER JOIN 
    (select cmp_id, max (cmp_name) as name 
       from itops_offer group by  cmp_id
    ) c           ON A.act_cmp_id = c.cmp_id
    where itt.type = 'ajstmnt' 
    and itt.event_header_event_name NOT IN ('composite.sys.act.merge', 'pos.sys.identity', 'composite.sys.act.pcmerge') 
    and itt.event_atomic_operation_type  = 'CT'
    and itt.tr_date >='2018-10-31' 
    group by itt.tr_date, channel, location_storeparentid, meta_trxnreason,  act_cmp_id,name;