performancehivehadoop-yarnquery-tuningapache-tez

need help in re-writing this query, which uses same data set multiple times, as per explain plan


We have a query run by our development team that's heavy in resources and looking at the explain plan, it looks like its uses the same data set multiple times. Is there anyway we can re-write this query.

Now, i tried to replace the co-related query with direct join but still the multiple co-related queries look the same apart from one minor difference.

select tb2.mktg_id, mktg_cd , count(distinct tb2.conf_id) 
  from
(select conf_id, count(distinct c.mktg_id) as num_cpg 
   from acc_latst c, off_latst ot 
  where c.mktg_id = ot.mktg_id and c.bus_eff_dt > '2019-01-01' and to_date(strt_tms) = '2019-01-10'  
  group by conf_id 
 having count(distinct c.mktg_id) >1 
)tb1,
(select distinct conf_id, c.mktg_id, mktg_cd 
   from acc_latst c, off_latst ot 
  where c.mktg_id = ot.mktg_id and c.bus_eff_dt > '2019-01-01' and to_date(strt_tms) = '2019-01-10'
)tb2
  where tb1.conf_id = tb2.conf_id group by tb2.mktg_id, mktg_cd 

Solution

  • One way is using CTE -

    with res1 as 
    (
    select distinct conf_id, c.mktg_id, mktg_cd 
       from acc_latst c, off_latst ot 
      where c.mktg_id = ot.mktg_id and c.bus_eff_dt > '2019-01-01' and to_date(strt_tms) = '2019-01-10'
    )
    ,res2 as
    (
    select conf_id, count(distinct c.mktg_id) as num_cpg
    from res1 group by conf_id having count(distinct c.mktg_id) > 1
    )
    select res1.mktg_id, mktg_cd, count(distinct res1.conf_id)  from res1 t1 inner join res2 t2 on t1.conf_id=t2.conf_id group by res1.mktg_id, mktg_cd;
    

    If the query is still slow, could you provide table and partition details.